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Abstract 



Coding theorems and (strong) converses for memory less quantum communication 
channels and quantum sources are proved: for the quantum source the coding theo- 
rem is reviewed, and the strong converse proven. For classical information transmission 
via quantum channels we give a new proof of the coding theorem, and prove the strong 
converse, even under the extended model of nonstationary channels. As a by-product we 
obtain a new proof of the famous HoLEVO bound. Then multi-user systems are inves- 
tigated, and the capacity region for the quantum multiple access channel is determined. 
The last chapter contains a preliminary discussion of some models of compression of cor- 
related quantum sources, and a proposal for a program to obtain operational meaning for 
quantum conditional entropy. An appendix features the introduction of a notation and 
calculus of entropy in quantum systems. 
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Introduction 



In the present thesis problems of information in quantum systems are discussed, mainly in 
the context of coding problems of various kind. Thus we follow a line of research initiated 
by [Shannon (1948| ), where informational-operational meaning was lent to terms like 
entropy, information, capacity, building on models of a stochastic nature. This is where 
quantum theory enters, which is generally understood to be a stochastic theory (starting 
with [Born (1926| ), now in any modern textbook, e.g. Peres (1995| )). A stochastic theory 
however of a novel type: it was soon understood that the statistical predictions of quan- 
tum theory cannot be described in ordinary ("classical") stochastic theories ( [Einstein 
al. (1935| ), [Bell (1964|) ), and this is formally mirrored in the necessity to introduce a 



e 



"noncommutative probability" . 

These observations led physicists during the 1960s to speculate about the role of 
quantum probabilism in information theory: cf. the works of |GORDQN (196^ , [Levitin 



( j.969|) , and [Forney (1963|) . [Holevo (1973[) however is to be credited with founding 
an appropriate mathematical theory (after a first step by [Stratonovich (1966[) ) and 
proving the justly named Holevo bound on quantum channel capacities. This work was 
extended subsequently by [Holevo (197£ ). Apart from this and formulating the definite 
model ( [Holevo (1977[ ), relying on earlier clarifying work on quantum stochastics by 
[LuDWiG (1954|) , Holevo, and Davies & Lewis) efforts concentrated on the analysis of 
specific restrictive or highly symmetric situations. 

Then progress in foundations ceased, until the stormy revival and extension of the 
subject in 1994, which year saw two important contributions: the quantum algorithm of 
[Shor (1994[) for factoring integers, proving the power of quantum information processing, 
and by [Schumacher (19951) the successful interpretation of von Neumann entropy as 
asymptotic source coding rate for quantum information (at the same time establishing 
quantum information at all as a quantity, distinguished from what is now called "classical 
information" . The reader should be aware however that it was known from the early days 
of quantum theory on that operationally quantum states are "more" than the knowlegde 
we can acquire about them. A true expression of this qualitative distinction is the no- 
cloning theorem of [WOOTTERS fc Zurek (1982| ), stating that quantum states cannot be 
duplicated, i.e. "copied", whereas classical data obviously can). 

Both works continue to exert a tremendous influence on the new thinking about quan- 
tum information theory. After that soon the coding theorem complementing the HOLEVO 
bound was proved ( [Hausladen et al. (1997[ ), [Holevo (1998a[ ), [Schumacher fc West 
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MORELAND (1997)), and today we face a variety of classical, quantum, or mixed informa- 
tion models, some of which at least we understand. 

The present work opens and closes with quantum information: beginning with a review 
of Schumacher's quantum source coding, to which we contribute the strong converse, 
ending with some speculations about multiple quantum source coding. In between we 
deal with transmission of classical information via quantum channels. Here our achieve- 
ments include new proofs of the channel coding theorem (which is new for nonstationary 
channels), and the completely new strong converse (independently [Dgawa fc Nagaoka 



' 19981) have proved the strong converse for stationary and finite alphabet channels by a 
different method), estimates on the reliability function, and — as a by-product a new 
proof of the HoLEVO bound. In the third chapter we determine the capacity region of 
quantum multiple access channels, using our results on multiple quantum source coding 
with side information from the fourth chapter, where also a number of simple estimates 
on the rate region and some examples are discussed. Among the positive results of this 
part are the weak subadditivity for the so-called coherent information (while the ordi- 
nary subadditivity one would expect fails), and the determination of the rate region for 
multiple classical source coding with quantum side information at the decoder. Thus we 
completely skirt all questions of channel coding of quantum information and noise protec- 
tion of quantum information by quantum error correcting codes, these issues only entering 
implicitely in the discussion of multiple quantum source coding. Also we choose to stay 
with discrete (i.e. finite, or, in the quantum case, finite-dimensional) and memoryless 
systems: this is not an essential restriction for our results, but allows to work consis- 
tently with techniques of a combinatorial flavor and to skip technicalities (such as finite 
variance conditions etc.) which, at the present state of techniques, could not have been 
avoided. The restriction is further justified by the ignorance on many questions even in 
this somewhat narrow setting. An appendix contains the necessary elements of quantum 
probabilistic theory and a calculus of entropy and information in quantum systems. It 
will be referred to for any concept of that field needed in the main text. 

Parts of this work have been pre-published in the author's work: appendix ^ is dis- 
tilled from its (very inadequate) predecessor [Winter (1998c| ), chapter | is from [Winter 



1999c[) , and the results of chapters and |IT| were reported by [Winter (1998b[ ) , [Winter 



(1999aD , [Winter (199I)B[ ), and [Winter (l99HiD . 



I have tried to give due credit (or else a reasonable reference) to any result of some 
importance, especially in the main text. If there is no credit it is implicit that I am 
the inventor. However this does not apply to a number of propositions of less weight, 
especially in the appendix, which I found on my own but which I regard as "folklore", 
and thus never tried to trace them back to an original inventor. 
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Chapter I 

Quantum Source Coding 



In this chapter quantum information and the notion of its compression are introduced. 
To prove the corresponding coding theorem and strong converse basic techniques are 
developed: a relation between fidelity and trace norm distance, different notions of typical 
subspace, and an estimate on general rj-shadows. Finally we comment on the relation to 
classical source coding. 

Models of quantum data compression 

Fix the complex Hilbert space TC, d = dimTi < oo. £.(H) denotes the algebra of (bounded) 
linear operators of Tt, £(7^)* its predual under the trace pairing.^ 

A (discrete memoryless) quantum source (q-DMS) is a pair (P,-P) with a finite set 
P C £(7^)* of pure states on £(7^) and a p.d. P on P. The average state of the source is 

An n-block code for the q-DMS (P,-P) is a pair (e,,,,^*) where e^, : P"' —>■ ii(/C)* maps 
P" into the states on (with some Hilbert space /C), and 5, : £(/C), Il(7^)f" is 

trace preserving and completely positive (i.e. it is a physical state transformation, see 
appendix ^ . 

We say that (e,,,^^,) is quantum encoding if is the restriction to P" of a trace 
preserving and completely positive map : i2(?i)f" £(/C)^,. If there is no condition 
on e^: we say that (£:*,(5*) is arbitrary encoding. 

For an n-block code (e*, S^:) define 

1. the (average) fidelity 

F = F{e,,6,)= J2 ^"(7i-")-Tr((5,£,7r")7r"), 

^For these notions (algebras, states, operations, trace pairing, trace norm, etc.) see appendix ^, 
section Quantum systems. 
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2. the (average) distortion 



£> = D(e.J.)= V P"(ir")-ill*.>;.T"-ir" 



7r"eP" 



3. the entanglement fidelity (see [Schumacher (1996|) ) 



Fe = 5,) = Tr (((5,5, ® id)*|^)^|^) , 

where \l/pp is a purification of PP, i.e. it is a pure state on an extended system 
(by tensor product with some space Tio), and PP = '^pp\st{H) (cf. [Schumacher 



(1996[) who proves that Fe does not depend on the purification chosen). Note that 



this makes sense only if {e^,5^) is quantum encoding. 

Observe that generally p" = pi ® ■ ■ ■ ® Pn denotes a product state of n factors, while 
p®n = p (g, . . . (g, p ig n-fold tensor power of p. 

Theorem I.l 

D'^<l-F<Dandl-F<l-F, . 



Proof. For the last inequality see [Schumacher (1996[ ). The first double inequality follows 



from lemma 1.3 below by linearity, and by convexity of the square function. □ 



A digression on fidelity First note that both -D(p, a) = - \\p — cr||i and 1 — F{p, a) 
1 — Tr (per) obey a triangle inequality: 

IIPi ® P2 - O"! ® O-2II1 < IIPi - O-llll + \\P2 - Cr2||l 

and 

1 - P(pi ® P2, (Ti ® (T2) < 1 - P(pi, ai) + 1 - P(p2, (T2). 



Lemma 1.2 (Pure state) Let p = and a = \4>){(j)\ pure states. Then 

l-F{p,a) = D{p,af . 

Proof. W.l.o.g. we may assume = a\0) + and |0) = a|0) - (|ap + = 1). 
A straightforward calculation shows F = (|a;p — |/3p)^, and D = 2|a/5|. Now 

l-F = l-i\a\'-\P\r 

= (l + \a\'-\P\')(l-\a\' + \P\') 
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Lemma 1.3 (Mixed state) Let a an arbitrary mixed state (and p pure as above). Then 



D> 1- F> . 



Proof. Write a = qjTCj witli pure states ttj. Tlien 

1 - F(p, cr) = J2 QJ (1 - vr,)) = Yl ^Mp^ 



> 



J 

2 

2 



Conversely: extend p to tlie observable (p, 1 — p) and consider the quantum operation 

: cr I — > pap + (1 — p)cr(l — p). 

Then (with monotonicity of || ■ ||i under quantum operations, see appendix section 

Quantum systems) 

2D = \\p — (t||i > \\k^p — K,*a\\i = \\p — K*a||i 
(since p = K^,p). Hence with F = Tr (crp) 

2D > ||(1 - F)p - Tr (a(l - p))7r||^ 
= (1-F) + (1-F) = 2(1-F) 
for a state tt supported in 1 — p. □ 

Observe that the inequahties of this lemma still hold if only Ylj^j ^1- To close our 
digression we want to note two useful lemmata concerning "good" measurements: 

Lemma 1.4 (Tender operator) Let p be a state, and X a positive operator with X < 1 
and 1 - Tr (pX) < A < 1. Then 



p~y/XpVx 



< V8A . 

1 



Proof. Let Y = \/ X and write p = ^^.PkT^k with one-dimensional projectors vr^ and 
weights Pfe > 0. Now 

2 



\p-YpY\\l< [Y,Pkhk-YTikY\\^ 
<Y.Pkhk-YiTkY\\l 

k 

<4 5^Pfc(l-Tr(7rfer7rfer)) 

k 

<8 5^j9,(l-Tr(7rfcr)) 



k 

= 8(1 - Tr (pF)) 

< 8(1 - Tr (pX)) < 8A 
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by triangle inequality, convexity of x ^ x'^, lemma |L^, 1 — < 2(1 — x), and X <Y. O 

Lemma 1.5 (Tender measurement) Let pa (a E A) a family of states on Ti, and D 
an observable indexed by B. Let ip : A — > B a map and A > such that 

\/aeA l-Tr(p„D^(„)) < A 

(i.e. the observable identifies <f{a) from pa with maximal error probability X). Then the 
canonically corresponding quantum operation 

D-mt* : p I — ^ ^ ^/iTbp^/lTb 
disturbes the states pa only a little: 

\faeA ||Pa - Ant*Pa||l < VsA + A. 

Furthermore the total observable operatior^ 

beB 

satisfies 

Vae^ II ® Pa- Aot*Pa||l < A. 

Proof. An easy calculation: 

||Pa — Ant*Pa||l < ||Pa " a/ Dip{a)Pa\/Dipia) \\l + ^ ||V^APaV^||l 

= \\Pa - \^D^^a)Pa\^D^(a)\\l + ^ Tr (paA) 

by^ipia) 

< y^8(l-Tr(p,D^(,))) + 1 - Tr (pa/^^(a)) 

< y8A + A, 

using triangle inequality and lemma [T^ The second part (which actually implies the 
first) is similar. □ 

Remark 1.6 // we modify the statement of the lemma to that the average error in iden- 
tifying ip{a) from pa should be at most A (relative a distribution on A), then also the 
distortion bound of the lemma holds — on average. 

^See also appendix |A|, section Common tongue, for Di^t and Dtot- 
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Let us return to the source coding schemes: The n-block code (e*,^*) is called an 
{n, \)p-code if 1 — F{e^, 5^) < X. Similarly an (n, A)ir^-code is defined. The n-block code 
(e^K, 5*) is called an (n, X)jj-code if /)(£:*, 5^) < A. 

The rate of an n-block code (e*, 5^) is defined as R{e^, 5^) = ^ logdim/C.| 
From the previous theorem it is clear that the most restrictive model is where we 
have to find an (n, A)Fe^code with quantum encoding, whereas the most powerful model 
is where we have to find an (n, A) jf-code with arbitrary encoding (equivalently we may 
use D). Now we define for a q-DMS (P, P) 

1. the A-(quantum,Fe)-rate as 

Rq^pA^) = limsupmin{i?(£^, (5*) : {e^,6^) an (n, A)i?^-code with qu. encoding}, 

n— >oo 

2. the A-(quantum,F)-rate as 

^q,FW = limsupmin{i?(£^,, (5*) : an (n. A) j^-code with qu. encoding}, 

n— >oo 

3. the A-(arbitrary,F)-rate as 

Ra,FW = limsupmin{i?(£^,, (5*) : (e*,^*) an (n. A) j^-code with arb. encoding}. 

n— »oo 

Despite our lot of definitions the situation turns out to be quite simple: 

Theorem 1.7 For all A G (0, 1) the three \-rates of the q-DMS (P,-P) are equal to the 
VON Neumann entropy of the ensemble {P,P): 

^.,F.(A) = i?,,^(A) = R^^piX) = H{PP), 

where H{p) = — Tr (plogp) (see appendix^, section Entropy and divergence^. 

Proof. Between the first two members of the chain we have ">" by theorem between 
the second and the third ">" is obvious. Rq^p^{X) < H{PP) follows from the coding 
theorem |1.16| . Finally R^^fW ^ H{PP) follows from the strong converse theorem |1.19| . □ 



Typical subspaces and shadows 

Let P a p.d. on the set X, with \X\ = a < oo. Define for a > the set 

r^p^^ = {a;" G ;f " : Va; G A' |Ar(a;|x") - nP{x)\ < a^P{x){l - P{x))^/^} 



of variance-typical sequences with constant a (in the sense of |WOLFOWlTZ (1964 )), where 



N^xlx"") = \{i : Xi = a;}|. For a sequence x" the empirical distribution P^n on X (i.e. 
Pxn{x) = ^N{x\x"')) is called type of x". 

It is easily seen that there are at most (n+ 1)" types; this kind of reasoning is generally 
called type counting. 

•^Here and in the sequel log is always understood to base 2, as well as exp. The unit of this rate is 
usually called qubit (short for quantum bit: the states of a two-level quantum system £(C^)). 
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Lemma 1.8 (cf. [Wolfowitz ( 19641) ) For every p.d. P on X and a > 



J > 1 



a 



a 



2 



IV^^pJ < exp {nH{P) + Kda^) . 

Proof. Typa is the intersection of a events, namely for each x & X that the mean of 
the independent Bernoulh variables Xi with value 1 iff Xj = x has a deviation from its 
expectation P{x) at most a^J P{x){l — P{x))/ y/n. By Chebyshev's inequality each of 
these has probability at least 1 — l/a^. 

The cardinality estimate is like in the proof of the following lemma □ 

Now construct variance-typical projectors ^v,p,a using typical sequences: for a diagonal- 
ization p = qjiTj let Sj = a/ qj{l — Qj) and 

V,p,a = {(Ji, • • • , Jn) : Vj \Nij\r) - nq,\ < s,aV^}, 

and define 



(ii,...,i„)er" 



V.p,cy. 



For a state p define p{p) as the minimal nonzero eigenvalue of ^,/p{l-^])) and N(p) 
dimsupp a/ p(1 — p), finally K = 2^2I£ Then one has 

Lemma 1.9 For every state p and n > 

d 



Tr (P^'^n^,^, J > 1 - - 

n 

V,p,a 



Tr (p^'^n^).^) > 1 - 2A^(p)e-2Mp)'"' 



and with = H^^^^^ 



n^exp {-nH{p) - Kda^) < n"p®"n" < n"exp {-nH{p) + Kda^/n) 

Tilil^p^^ < exp {nH{p) + Kday/E) . 
Every //-shadow B of p®" (this means < -B < 1 and Tr {p®^B) > rj) satifies 

TtB>(^7]- 2N{p)e-^^^P^'"'^ exp {nH{p) - Kda^) . 

Proof. The first estimate is the Chebyshev inequality, as before: the trace is the prob- 
ability of a set of variance-typical sequences of eigenvectors of the p^ in the product of 
the measures given by the eigenvalue lists. Similarly the second estimate is the well 
known inequality of |HoEFFDlNG (1963|) . The third estimate is the key: to prove it let 
tt" = TTj^ ■ ■ ■ (g) Tij^ one of the eigenprojections of p*^" which contributes to ^v,p,cx- Then 



Tr(p^-7r")=g,,---g,„ = n 



j 
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Taking logs and using the defining relation for the A^(j |j"') we find 



J]-loggj|A^(j|j") -ng^ l 
j 

^-aVny/q]logqj 

j 

2ay/n^-y/q]\ogy/q] 



< 2a ay/n 

e 



The rest follows from the following lemma |1.10 



□ 



Lemma 1. 10 (Shadow bound) Let < A < 1 and p a state such that for some 
\pi,P2 > 

Tr (pA) > 1 - A and pik < VXpVX < p2^- 
Then (1 — \)p2^ < Tr A < p^^ and for < i? < 1 with Tr (pi?) > 7] one has Tr B > 
rj — VSX) P2^ ■ If p and A commute this can be improved toll B > {rj — X) p2^ ■ 



Proof. The bounds on TrA follow by taking traces in the inequalities in y/Xpy/X and 
using 1 — A < Tr (pA) < 1. For the r^-shadow B observe 

/iaTr > Tr {P2AB) > Tr (yXp^/AB^ 

= Tr (pB) - Tr (^(^p - VXpVfCj B^ > r] - p - VXpVX ^ . 

If p and A commute the trace norm can obviously be estimated by A, else we have to 
invoke the tender operator lemma to bound it by VSX. □ 

For the benefit of discussions in later chapters let us mention here two other notions of 
typical projector: 



Entropy typical projectors Let pi, ■ ■ ■ , Pn states on £(7^), with diagonalizations pi 
J2j with one-dimensional projectors Hij. Let 5 > 0, and define 



'^H,p",S — {(jl) • • • 5 in) ■ 



log gij-, - ^i7(pi 



i=l 



i=l 



Define the entropy-typical projector^ of p^ with constant 6 as 

H'^ „n ^ = > TTi (g) ■ ■ • (g) TT, 



(ii,...,in)er^,,„,, 



Then we have the following 



''This is essentially what Schumacher (1995 ) calls typical subspace. 
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Lemma 1. 11 There is a constant K depending only on d (in fact one may choose K < 
max{(log 3)^, (logd)'^}) such that for arbitrary states pi, . . . Pn 

Tr (p"n^,,„,,) > 1 - f • 

Proof. This is just Chebyshev's inequality applied to the random variables Xi = — logg.|j 
for the diagonalizations pi = QjiiT^ij- Observe that K may be any bound for the 
variance of the Xi. □ 

Concerning its size we have 

Lemma 1.12 For the entropy-typical projector 

(^1 - exp (j2 H{p^) - 5^ < Tr n^_^„^, < exp (j^ H{p^) + 6^ . 
Conversely, if B is an rj-shadow of p^ then 



1iB>(^-^ exp H{p,) - 



Proof. Observe that by definition of 11" = 11^ ^ 

exp I - ^ H{pi) - I < n>"n" < n" exp ( - ^ H{pi) + 5^ 

Now the lemma follows by the shadow bound lemma [1.1U| . □ 

Constant typical projectors Let p a state with diagonalization p = Ylj^j'^jy 
6 > 0, then define 

'^c,p,5 = {(Ji, • • • , Jn) : Vj \Nij\r) - ng,| < S^^}, 
and the constant-typical projector 

J2 VTj, ® • ■ ■ O 71 . 

j" with II J]"^^ TT-j.-np\\ac<Sy/n 

Then one has 
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Lemma 1.13 (Weak law) Let p, pi, . . . , pn states of a system and 6,e > such that 



1 " 
n ^-^ 

i=l 



< e. 



Then 



Proof. Consider the diagonalization p = Ylij'ij'^ji ^^"^ conditional expectation map 



Defining p'- = K^{pi) we claim that 



TTJi ^ jrn 



Indeed observe that we have 

1 " 



i=l 



< 



1 " 

-E 



n 



i=l 



< e. 



Thus for j" = (ji, . . . , j„) with 



i=l 



we have by triangle inequality 

n 
i=l 

So we can estimate 



< {6 + e^)^/n . 



= Tr(p-n^^,^^^^,,) 



1 



the last line by d uses of Chebyshev's inequality, as in the proof of lemma |]9. 

Concerning the size of this projector we have 
Lemma 1.14 For every state p and < 5 < ^y/n 

TrIl2jps<{n + lYexp(nH{p) + ndr] ^ 



□ 
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Proof. The whole question reduces obviously to counting sequences of eigenvectors of p 
with type close to the p.d. given by the eigenvalue list of p. Each sequence of type P 
has P'^"-probability exp{—nH{P)). Thus there are at most exp[nH[P)) of these. Since 
there at most {n + l)*^ many types, and by the continuity of entropy (lemma |A.4| ) the 



statement follows. □ 

The constant typical projectors will be used as shadows of whole sets (namely of states 
which satisfy the "average" condition of the weak law lemma 03]). 



Schumacher's quantum coding 

Let a > 0. The Schumacher scheme with constant a for the q-DMS (P, P) is the 
following family of ra-block codes {e^,6^:) with quantum encoding: define 11" = ^v,pp,a 
and the Hilbert space JC = imll", and 

1 - Tr (o-n") 

a ^ n"an" + — ^ V 1 

dim aL 



5, : i:(/C), £(H) 
a I — > a . 



Remark 1.15 Essentially the above scheme was first defined by [Schumacher (199^ 



with a slightly different definition o/II". The great contribution o/ |Schumacher (1993^ ) 
was to notice the possibility and importance of having a typical subspace, and the following 
theorem is just a variation of the original argument. Subsequently there appeared minor 
modifications and refinements ( |JozsA fc Schumacher (19U^ and [Jozsa et al. (1998J ), 



but all rely on one or another notion of typical subspace ofH 
Theorem 1.16 The SCHUMACHER scheme has rate 

Kda 

R{e,,6,)<HiPP) + ^ 

and entanglement fidelity 



0™ 



Proof. The rate estimate is immediate from lemma |L9[ For the fidelity consider a purifi- 
cation of PR = Qjlfj) e.g. the projector of \ip) = y^lv^j) ® \ on £(7Y®^). 
With that the fidelity estimate follows easily from the shadow lemma 0|. □ 
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By slightly changing the definition of the subspace used we arrive at the JHHH-sc/ieme 



of IJOZSA et al. (19981) : just take for E" the projector 

(the least common support) with some rate R > 0. Then in |JozsA et al. (1998| ) it is 
proved that this gives universally good compression of all sources (P, P) with H{PP) < R. 
For one thing (see |JozsA et al. (1998| )) 



TrIll^.^^^<{n + lf+'exp{nR), 

and for the fidelity one has 

Theorem 1.17 Let {e^,, 5*) the JHHH-sc/ieme with rate R and block length n. Then for 
every q-DMS (P, P) 



F,{e,,6,)>l-2{n + lYexp{-n- min D{u\\PP) ] . 

Proof. First note that by direct calculation for v codiagonal with a state p we have 

Yil^^.p'^-Yil^^, = Yil^^, exp {-nD{v\\p) - nH{v)) 
(see lemma |11.12| ). Fix a diagonalization PP = ^ . qjiTj and observe 



T-rn ^ TTJi 

ueC[TTi,... ,TTa],H{u)<R 

Using the simple facts that Iiv,ufl 7^ only if G ^N[7rj|j], and Trlly^Q < exp(r;,if(z/)), 
we find as in the previous theorem 

1 - F{e,, 5,)<2 exp(-nD(z/||PP)) 

u(L^n[-K,\j],H{u)>R 



<2(n + l)''exp -n- min D{u\\PP) 

V H{v)>R 

where the last estimate is by type counting: there are at most {n + lY different z/ diagonal 
in the basis {vrj|j} and I^tj^q 7^ 0. □ 

Strong converse 



The first proofs by Schumacher (1995| ) and |Jozsa fc Schumacher (199^ ) for the 



optimality of the Schumacher scheme where valid only under the additional assumption 
that (5* is of the form 5*(cr) = UcrU* for a unitary embedding f/ of /C into Ti*^". Also 
they achieved the bound H{PP) only in the limit of A — > (so they proved a weak 
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converse 



e). The proof of Parnum et al. (1996| ) removed the restriction on 6^, but still 



yields only a weak converse. Also it works with some surprising and difficult fidelity 
estimates, involving even mixed state fidelity, see |JozsA (1994| ) (We may note that they 



seem to be related to our inequalities of theorem p| ). We should also mention the work 



of [Allahverdyan Saakian (1997a| ) where a weak converse was proved for quantum 



encodings and using entanglement fidelity (compare our theorem [IV. 7| with s = 1). The 
criticism of the authors on Barnum et al. (1996|) however is unjustified: by the above 



discussion (proof of theorem p7|) their result is weaker than that of [Barnum et al. (1996| ). 



Then [HoRODECKl (199^ ) noticed that considering D instead of F drastically simplifies 
the proof. His argument is as follows: 

Assume that we are given a code (e*,^^,) with arbitrary encoding in the states on a 
/c-dimensional Hilbert space and 

So by Markov's inequality there is a subset C C P" with P"(C) > 1 - 2y/X and 

V7r"eC ||7r"-(5,£,7r"||i < . 

Now form the state a = ^^ngpn P"'(7r")e*7r"', then by Uhlmann's monotonicity of the 
quantum I-divergence (theorem |A.5|) 

Averaging we obtain 

J2 P"(vr")D(£*7r"||fT) > J2 ^"(7r")/^(5*£*7r"||5,a). 

7r"eP" 7r"eP" 

Now it is straightforward to calculate the l.h.s. of this to -f^(cr) — Yl-K^eP" P"'{'^"')H{e^:TT'^), 
whereas the r.h.s. evaluates similarly to H{6^a) — XItt^gP" P"'{'^"')H{S^e^7T"'). Since 2A < 
1/2 and v^A < 1/2 we can use a continuity property of H (see lemma [A.4[) : 
(PP)®"||i < 2A implies 

|i7(M-^^(^P)l<-2Alog^, 

and (for vr" G C) ||(5*£:*7r" — 7r" ||i < \/A implies 

|if(5*£*7r") -i7(7r")| < -v^log^ . 
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Combining we get the chain of inequahties 

\ogk > H{a) 

> H{a) - J2 ^"(vr")iy(£*vr") 



>nH{PP)- J2 ^"(7r")i/(7r") -2v^logrf" + 2Alog^ + yAlog^ 

LL (Jj 

= nH{PP) - n{2\ + SVX) logrf + 2A log2A + v^log . 
Thus we proved 

Theorem 1.18 (Weak converse) For every q-DMS (P,-P) 

hminf i?„j?(A) = hminf i?„5(A) > H{PP). 

But in fact much more is true: 



□ 



Theorem 1.19 (Strong converse) Let (P,-P) a q-DMS and {e^,S^) an (n, X)p-code 
with arbitrary encoding, and a > 0. Then 

dim /C > (^1 - A - 4v/iV(PP)e-'^(^^)'"') ■ exp (nif(PP) - Kda^) . 

Proof. Let B = 5=„(l/c) and 11" = Hyppa- Since e^Ti"' < 1/c for every vr" G it is clear 
that 5*e,7r" < 5. Thus 

Tr {B ■ n^TT^n") > Tr ((5,e,7r")n"7r"n") 

= Tr ((5,£,7r")7r") - Tr (5,£,7r"(7r" - UVW)) 
> Tr (((5,e,7r")7r") - ||7r" - n"7r"n"||i 



> Tr ((5,£,7r")7r") - ^8(1 - Trvr^n- 



(the last estimate by lemma [T^ . Averaging over P®" we find, with the shadow lemma [TQ 
and concavity of the square root: 



Tr (n"(PP)®''n"P) >F- v^S (1 - Tr (PP)®«n") 
> 1 - A - 4v/iV(PP)e-^(^P)'"^ 

Since by lemma O 
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we conclude 

TrS > Tr (SH") > (l - X - 4^/N{PP)e-^^^^^'"'^ ■ exp {nH{PP) - Kday/n) , 
and with dim /C = Tr lyc = Tr i? the proof is complete. □ 

Corollary 1.20 Let En = o{n) and A„ < 1 — e^^"-. Then for every sequence {Sn*, Sn*) of 
{n, Xn)F~codes with arbitrary encoding for the q-DMS (P,-P) 

liminfi?(£„„5„*) > H{PP). 

n— »oo 

□ 

Remark 1.21 The proof of the above theorem is remarkable in that it employs a positive 
operator which is not necessarily bounded by 1 (this is why we could not directly apply the 
shadow). Even though it has consequently no interpretation as a physical measurement 
(maybe it has one as a quantity^, it can be analyzed to give information about the coding 
scheme. 

Relation to classical source coding 

Consider a slight variation of our initial model: P is now a set of pure states on a finite 
dimensional C*-algebra 21 (which is a direct sum of full matrix algebras and 
consider only F as a fidelity measure. A major (and extremal) example is a classical 
source, i.e. 21 = CX is commutative, with a finite set X, and w.l.o.g. P = X (all possible 
pure states). The general case may be seen as an interpolation between this and the 
quantum case 21 = 'C('W) ■ 

Observe that since PR e 21* wc find the typical projectors 11"' in 21®" (note that for 
21 = CX such a projector is given just by a set of typical sequences from X"'). This means 
that the Schumacher and JHHH-schemes make sense by just replacing £{H) in the 
definitions by 21, without changing the fidehty values (note again that for 21 = CX the 
average fidehty is just the classical success probabihty). The strong converse need not 
be modified at all as Il(7i) is already the most "spacious" algebra imaginable. Thus we 
arrive (with obvious definitions) at 

Theorem 1.22 For all A e (0, 1) the arbitary and quantum encoding rates of the discrete 
memoryless source (P,i-*) on the C*-algebra 21 are equal to the VON Neumann entropy 
of the ensemble (P,P); 

□ 
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Open questions 

Dimension Why stay with finite dimensional spaces? In fact there is no obstruction 
to defining sensibly a Schumacher scheme, indeed the original paper of [Schumacher 



( j.995|) had no dimension restriction, instead (implicitly) requiring bounded variance of 
the information density, i.e. in the present setting the condition Tr (p(logp)^) < oo. Then 
the typical projector of choice is the entropy typical one, and in fact the reader may as 
an exercise translate the coding theorem and our strong converse to this situation. 



Memory It appears that no one has formalized the concept of coding a "quantum 
Markov chain" . 



Lossless coding It might be worthwhile to try and to convert the techniques of HUFF- 
MAN coding, and especially of arithmetic coding of the source to quantum sources. See 
Braunstein et al. (1998| ) for a discussion. 



Rate distortion theory Develop further a rate distortion theory: the start to this was 
made by Pendjaballah et al. (1998|) , and a short note of [Barnum (199^ ). 



Refined resource analysis A not yet investigated (and perhaps most interesting) 
problem is, how much "quantum" one actually needs to compress the source (P,-P): 



whereas dim /C is shown by theorem [I.22| to be a good resource measure, it is oblivious to 
the difference between an orthogonal ensemble (for whose coding a commutative algebra, 
i.e. a classical system, suffices), and a highly non-orthogonal one (which presumably needs 
all the quantum resources, i.e. possibilities of superpositions, of k degrees of freedom). 
As a measure of this "quantum" resource I propose the following: 
A coding scheme is a pair {e^, 5*) with 

e^: :P" — > a mapping, 

5* :.ft* — > 2lf a quantum operation. 



where is a finite dimensional C*-algebra. Quantum and arbitrary encoding schemes are 
as before. Observe that Tr l.t^ takes now the place of the previous dim/C. Define the, say, 
rate of superposition as 

r{e^,6^) = - (log dime - log Tr Ij^) . 
n 

Observe that < r{e^, 5=,,) < ^ logTr Ij^, with r{e^:, 5*) = iff is commutative. 

Now define for A G (0,1), R > the A-rates of superposition with arbitrary and 
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quantum encoding: 

r„_^(A, i?) = limsupmin{r(£*, 5*) : (£*,(5*) an (n, A)j.-code (arb. enc), R{e^,5^) < R}, 

n—*oo 

rg piKR) = limsupmin{r(£*, 5*) : an (n, A)^-code (qu. enc), < R}. 



It is obvious that r^ p and r^ p are nonincreasing functions of R, and that both are upper 
bounded by H{PP). The problem is now to analyze r^ p and r^ p depending on A and R. 

• It is clear that r„^(A, i?) = if i? is large enough [R = H{P) suffices). It would 
be interesting to determine the exact threshold, the value at i? = H(PP) and the 
behavior between these points. In any case, I conjecture that ra^p{X,R) does not 
depend on A G (0, 1). 

• I conjecture further that r^ p depends neither on A G (0, 1) nor on R > H{PP). If 
this is true rq p is an interesting ensemble property of (P, -P). 
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Chapter II 

Quantum Channel Coding 



In this chapter we introduce the notion of a quantum channel. From the beginning we 
focus on the product state capacity for transmission of classical information, and prove 
coding theorem and strong converse, even for nonstationary channels. In the finite sta- 
tionary case we can sharpen our rate estimates and derive some bounds for the reliability 
function. As a corollary to our strong converse we obtain another proof of the HOLEVO 
bound. 



Quantum channels and codes 



The following definition is after [HoLEVO (19771) : a (discrete memoryless) quantum channel 



(q-DMC) is a completely positive, trace preserving mapping (p^ from the states on a C*- 
algebra 2t into the states on £(7i), where d = dimTC is assumed to be finite. 

A nonstationary q-DMC is a sequence {(pn*)nef>i of q-DMCs, with a global Hilbert 
space H. This extends the concept of q-DMCs which are contained as constant sequences. 

An n-block code for a nonstationary quantum channel {(pn*)n is a pair {f,D), where 
/ is a mapping from a finite set Ai into ©(2ti) x • • • x (5(21^), and D is an observable on 
£,(H)'^^ indexed by D A4, i.e. a partition of 1 into positive operators Dm, m G Ai'. 
The (maximum) error probability of the code is defined as 

e(/, D) = max{l - Tr ((/.f" (/(m))D J : m E M}. 

We call (/, D) an {n, X)-code, if e(/, D) < A. The rate of an n-block code is defined as 
^ log \ Finally define N{n, A) as the maximal size (i.e. \Ai\) of an (n, A)-code. 

Remark II. 1 Observe that we did not allow all joint states of the system 2ti ® ■ • • (8) 2l„ 
as code words, but only product states. This is the restriction under which the current 
theory was done. It is unknown if the following theorem |//.^ is still true in the more general 



model: maybe higher capacities can be achieved there, see the discussion of [Schumacher] 
fc Westmoreland (1991\) . 
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With our restriction we may without harm identify a channel mapping Lp^ with its image 
2II(^ = V5=„(©(2t)) in the set of states on ii(7i) (for then the image of an input state under 
is a product state on £(7^)'^"). 

Generahzing, a nonstationary quantum channel is now a sequence (22J„)„ of arbitrary 
(measureable) subsets of states on a fixed £(?-^). In this spirit we reformulate the definition 
of an n-block code as a pair (/, D) with a mapping f : M. ^ Wi x ■ ■ ■ x SU^Q and D as 
before. The main result of the present chapter (to be proved in the following sections) is 



Theorem II.2 Let (2ni,22J2, 



a nonstationary q-DMC, and 



cm) 



sup /(P; Wi) 

P p.d. on Wi 



(with I{P; W) = H{PW) - H{W\P), see remark \K1^ ). Then for every X G (0, 1) 

-logiV(n,A)-- VC(2n,) 



as n ^ oo. 



Proof. Combine the coding theorem [11. 4| and the strong converse theorem |11.7| . 



□ 



This theorem justifies the name capacity (of the channel 211) for the quantity C{W), even 
in the strong sense of Wolfowitz (1964|) . Observe that this theorem is a quantum 
generalization of a theorem by [Ahlswede (19681) • 



Remark II. 3 It should be clear that the same (including proofs) applies if the output 
system £^{'H) is replaced by a *-subalgebra 21. 



Maximal code construction 

Theorem II. 4 (Maximal codes) For < r, A < 1 there is a constant K' and 5 > 
such that for every nonstationary q-DMC distributions Pi on 2Uj and A C 20" 

with P'^{A) > T there exists an {n, X)-code {f,D) with the properties 

WmeM f{m) e A and Tr < Tr IlHjim),5 . 
log \M\ > i7(P"2ir) - iJ(2ir|P") - ir'v^ 

n 

= J2 {HiPMi) - H{W,\P,)) - K'V^ . 

i=l 

Where we identify {pi, ■ ■ ■ , Pn) with = pi (g) • • • (g) /?„. 
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Proof. On every SUj the entropy H is a random variable with expectation H{Wi\Pi) and 
variance bounded by (logrf)^. Define 6 = max{^y2/X, ^/2/T\ogd}, then by Chebyshev's 
inequahty the set 



A'= {p'^eA: 



i=l 



has probabihty P^{A') > r/2. Now let {f,D) a maximal (n, A)-code with 

\/meM f{m) e A' and Tr < Tr n:^j(^),5 • 

Define B = J2meM claim that with r] = min{l — A, A^/32} 

Vp" G A' Tr (p"5) > r]. 

This is clear for codewords, and true for the other states because otherwise we could 
extend our code by the codeword with corresponding observable operator 



which clearly satisfies the trace bound (note that B + D < 1): to see this apply lemma [O 
to obtain 

A 



|p" - v^l^p"v^I^||i < ^ < 



Thus 



Tr {p-VT^Il\,.^s^^^) = Tr (p"n,,n,,) - Tr ((p" - VT^p''VT^Wh,p^^s) 



So B is an r^-shadow of A', and consequently 

Tr (P"2n"5) > r/r/2 . 



By lemma |L12| there is K with 

Tr S > exp ( ^ H{PiWi) - 

On the other hand 

Tr5 = Tr An < ^""^Vfimis < l-M|exp ( iJ(2n,|P,) + 25^^ ) , 



.i=l 



m<^M 



i=l 



the last inequality again by lemma 1.12, and we are done 



□ 
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Remark II. 5 We can strengthen the theorem to that all the Dm are projectors. The proof 
goes through unchanged but for the construction of the code extension: there we take the 
support of the above D. The trace estimate holds because the trace of a projector is the 
dimension of its range. 



Remark II. 6 The above coding theorem — for stationary channels and with slightly 
weaker bounds — was first proved by [HoLEVO (1998(\ ) (and independently by ^CHU-| 
MACHER fc Westmoreland (lOOl^ j), building on ideas o/ |Hausladen et al. (1991{ ) 
for the pure state channel. 



Strong converse 

Theorem II. 7 (Strong Converse) For every A G (0, 1) and e > there is uq = no(A, e) 
such that for every n > Uq and every nonstationary q-DMC 

n 

log N{n, A) < ^ C{Wi) + ne. 

i=l 

Before proving this we need to follow a short technical digression: 



Approximation of channels We have continuum many states on £(7i) to deal with, 
and even more channels, so we introduce a simple approximation scheme: a partition 3 
of &{2(H)) into t sections 3i, • • • ,3t each having || ■ ||i-diameter at most 6* > is called 
6-fine. The relation of the parameters t and 6 is: 

Lemma II. 8 For any 9 > there is a 9-fine partition of &{Sl(T-C)) into t < C9~'^^ 
sections, with a constant C depending only on d. 

Proof. The set of states is || ■ ||i-isometric to the set of positive d x d-matrices with trace 
one. This is obviously a compact set of real dimension c?^ — 1. It is contained in the set of 
all selfadjoint matrices with the real and imaginary parts of all its entries in the interval 
[— 1, 1] which is geometrically a (i^-dimensional cube. Now obviously we may decompose 
this cube into {2\/2d^Y^ 9~'^^ cubes of edge length 9/ {d^\/2). We claim that for two states 
p, p' in the same small cube ||p — p'||i < 9. But this follows from the fact that a matrix 
with all entries absolutely bounded by e has all its eigenvalues bounded by rf^e, which is 
straightforward (and rather crude). □ 

We close the digression with two definitions: the ^-type of a state p" is the empirical 
distribution on sections in which 3j has weight proportional to the number of pi G 3j- 
The ^-class of a channel Wi is the set of sections 3j which have nonempty intersection 
with 2IJ,. 

Obviously the number of 3^types is bounded by {n + 1)*, the number of 3^classes is 
bounded by 2*. 
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Proof of theorem \II. % Let (/, D) an (n, A)-code. Consider a ^-fine partition 3 of 
into t sections and choose representatives cr^ G 3j- For every (3-)class 7 let 

the set of indices i G [n] with SUj of class 7. Consider the (3-)types of the restrictions 
f{m)^^ of the codewords to the positions For each 7 with 7^ there is a type 
P-y occuring in a fraction of at least (|/^| + of the codewords. Successively choosing 
subcodes we arrive at a code Ai' with at least \ ■ (n + codewords and fimY'^ of 

type for all m G A^', whenever 7^ 0. 

For each i, i E I-y choose states pij G 22Ji fl 3j and define a distribution Pj on SUj by 
Pi(pij) = Py(j)- Finally let p^^ = PiWi = Y.j Pi{3)pij and = Y.j PyU)^j- 

For classes 7 with I/7I > n2~^* (which we call good) define (with some 6 > 0) 



c,a^,5+ew\i-,\ 



in £(7^) 



For bad 7 define 11^ = 1 in £(7i)®^^. Then by the weak law lemma [1.13| for ever?/ 7 

Vm G Tr (/(m)^^n7) > 1 - ^ 

and thus defining IIo = II^ we obtain 

2* 

Vm G Tr (/(m)no) > 1 - ttt • 

Now assume that n2~^* is large enough and 6 is small enough so that according to lem- 
mata [1.14| and |A.4| we have for good 7 



Y,H{p^,)+2\I,\e] . 



Trn^ < exp{\I^\{H{a^) +e)) < exp 
Hence we get (collecting the contributions of good and bad classes) 

/ n N 

Tr Ho < exp I ^ ^(Pn) + 2ne + n2-^ log d 



j=i 



Now consider the code {f',D') with /' = f\_M' and = UoDrJIo for ^ By 
the above considerations and lemma it is an {n, A + v^2*/^5^^)-code. Assuming 



^/82^/'^6~^ < by lemma we get 

TrD:„>exp(^/7(2n,|P,)-r2e 
if n is large enough. So we arrive at 

TrHo > J2 ^ l-M'|exp ( J^HmiK 



ne 



meM' 



.1=1 
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and thus 

\M\ < (n + l)*2'exp (^H{P,W,) - H{mi\Pi)) + 3ne + n2-'\ogd 
< exp J2(H{P,W,) - H{Wi\P,)) + 5ne 

(n N 
C{Wi) + 5ne 

if we can adjust our parameters accordingly: choose for example t = [|logn] with 
9 < f j (which is possible by lemma [II. 8|) , 6 = n^^^, and let n large enough. □ 

Remark II. 9 The weak converse is already a consequence of the information bound of 
HoLEVO (1973[ ), see theorem \A.1(X together with subadditivity of quantum mutual infor- 



mation ( corollary \A.l^ and the classical Fano inequality (see theorem \A.24 )- 



Refined analysis for stationary channels 

From this point on we restrict ourselves to the finite and stationary case. 

Let W : X &{£,{H)) a finite q-DMC, mapping x e X to the state W^, with a set X, 
say of cardinality \X\ = a < oo, for a fixed complex Hilbert space 7i of finite dimension d 
(i.e., in slight variation to the previous sections, we label the set of channel states by X). 
We will have occasion to consider other channels, say V, implicitely all with the same X. 
Note that we drop here the subscript * for state maps, to be closer to the notation in use 
in the literature. 

For an n-block code (/, D) for W we will here interpret / as a mapping from the finite 
set Ai into X"-. The (maximum) error probability of the code then reads as 

e(/, D) = max{l - Tr (W^/(„)D„) -.meM}. 

(For /(m) = a;" G Af" we adopt the convention W^/(m) = W^^ = W^^ <S) ■ ■ ■ ® W^^). The 
rate of an n-block code is defined as - log Recall that N{n, A) is the maximal size 
(i.e. of an (n, A)-code, and define 

emm{n,R) = min{e(/, Z^) : {f,D) is n-block code, \A4\ > exp(ni?)}. 

Finally for states p and u, and another channel V and p.d. P on X let 

D{iy\\p) = Tr (z/(logz/ - logp)) 
D{V\\W\P) = J2Pi^)D{V4W,), 
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the (conditional) quantum I-divergence, see appendix 0, section Entropy and divergence. 

The rewards of our restriction are stronger estimates on N{n, A), and — more interest- 
ingly — upper and lower bounds on e^i^i^n, R), which lead to nontrivial lower and upper 
bounds on the reliability function of the channel. This extends results of [Burnashev fc 



I j!OLEVO (19971) from pure state to general channels, and thus gives (partial) answers to 



two problems posed by [HoLEVO (1998b|) . 



Some more typicalities We begin with an extension of lemma f]9|: define the condi- 
tional variance-typical projectors UY^rg{x"^) with G A"" to be 

where = {i E [n] : Xj = x}. 

Lemma 11.10 For every G X'^ of type P , and with 11" = Yiyy^^{x'^) 

Trl^..n">l-^ 

n^exp {-nH{W\P) - Kd^5^) < n^W^nH'" < n"exp {-nH{W\P) + Kd^5^) 

Trn^^;^.,5(x") < exp {nH{W\P) + Kdy/^5y/^) 
Trn;),^,,(x") > (^1 - exp {nH{W\P) - Kd^5^) . 

Every T) -shadow B ofWx^ satifies 

TiB> (^V-^^ exp {nH(W\P) - Kd^5^) . 



Proof. The first inequality is just a times the estimate from lemma The estimate for 
ny^y5(x")Wrnny^5(x") foUows from piecing together the estimates for the Ily ^ in 
the same lemma (using XlxeA" V P{x) < ^/a). The rest follows from the shadow bound 



lemma 1.10. □ 



From this we get the following 

Lemma 11.11 Let 5 > and x" G X"^ of type P. Then 

Tr (Wx^^K^p^^,^) > 1 - ^ • 
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Proof. Diagonalize PW = l^t : £(7-^)* the conditional expecta- 

tion be defined by = TTjcrTTj. We claim that 

Indeed let ttj-^ ® • • ■ ®7rj,^ one of the product states constituting <S>x<^x ^vn^iWa:) <5' ^i^^i 



Hence (with l/^l = P{x)n) 



< J2^Vny^PixjJqj\x{l - qj\. 



qj\x) 



< 5^^^qj{l - qj), 

the last inequality by concavity of the map x x{1 — x), and qj = XlxgA' P{^)lj\x- 
Thus we can estimate 



> Tr ii^TWx^Wy^^^wA^n) 
ad 

- ~ 52 ' 

the last line by lemma [1I.10| . □ 

Of particular interest are the variance-typical projectors with 5 = 0, i.e. the 11^ = 
IlypQ and n|^(x") = UY^rQ{x'^), which we call exact types. 

For the following fix diagonalizations p = qjiTj and = Ylj 1j\x'^xj- The commu- 
tative algebras C[7rj|j] and CfvTi^jlj] (which are maximal commutative subalgebras of the 
commutants C[p]' and C[Wr]') will be important below. 

Lemma 11.12 For v G C[p]' vje have 

n>««n:: = n::exp {-nD[v\\p) - nH[v)) 

(n+ l)-'^exp(n/7(z/)) < Trn;j < exp (71/^(1/)). 

For 14 e C[W,]' and a;" G X"" of type P 

ny(x")W;nn^(x") = n^(x") exp {-nD{V\\W\P) - nH{y\P)) 

{n + ly'^ e-XY>{nH{V\P)) < TiW^ix") < exp{nH{V\P)). 
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Proof. The first equation is straightforward. To estimate Tr 11" let p = u and note that 

(n + 1)-'^ < Tr(z/®"n::) < 1. 



There the upper bound is trivial, whereas the lower bound is by type counting, i.e. observ- 
ing that in the decomposition 1 = ^^gc[^ i^jH" there appear at most (n + 1)'^ nonzero 
terms, and the fact that for such P the quantity Tr (z/*^"!!") is maximized with u = u 
(Compare [CsiSZAR fc Korner (1981| ), lemma 1.2.3). The second part of the lemma 



follows from the first by collecting positions of equal letters in x". □ 

Corollary 11.13 If u e C[p\' and ^ then 

{n+iy^expi-nDiuWp)) < Tr (p®"n^) < exp{-nD{u\\p)). 



Define for a state p, channel W, x"' G Af" of type P, and a real number L: 

np,H(.)<L = X] 

iyeC[7Tj\j],H{u)<L 
TT" — \^ TT" 

i.eC[7rj|i],_ff(i/)>L 

niV,//(-|P)<L(^") = ^ ny(a;"') 

K.eC[7r,j|j],H(V|P)<L 

niV,//(-|p)>L(^") = ^ ny(x"'). 

VieC[7r,j|j],H(V|P)>L 



Lemma 11.14 For p, W , x" G X"^ of type P , and L as above 

Tr (n;V,H(.|P)<L(^")) < (n+l)'^'^exp(nL) 

Tr (W^..n^,^(.|p)<Jx")) > l-(n+ir'^exp(-n- inf DiV\\W\P)) 
Tr (p«"n;_^(.)>^) > 1 - (n + l)'^exp f-n ■ min D{v\\p)\ . 



□ 



Proof. The inequalities all follow from lemma pi.l2| and corollary pi.l3| together with type 



counting. □ 
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Lemma 11.15 For p and L as above 

K,hC)>lP''''Km-)>l < np,H(.)>Lexp (-n ■ mm (Hiu) + Diu\\p))) 

= np,H{.)>Lexp ( -nL - n ■ mm^ D{v\\p) 



H{v)=L 

< np,H(.)>Lexp ( -nL - n ■ min D{v\\p) ) . 



For an rj -shadow B of p 



Cg)n 



Tr 5 > I — (n + 1) exp(— n ■ min D(z/||p)) ) • exp I nL + n ■ min D(z/||p) ) . 

H{v)<L I \ H(y)<L 



Proof. The first estimate is directly from lemma |L1.12|. To see tliat tlie required min 



is assumed at tlie boundary of tlie (convex) region wliere H{v) > L observe tliat tlie 
minimized quantity is linear in u. 



For tlie 77-sliadow B: note that by lemma [II.14| with 11'^ = H^^^ -j^^^ 



Tr (p®"n"5n") >ri-{n + 1^ exp (^-n ■ mm^D{u\\p)^ 
and the rest follows by the estimate on IPp^^IP. □ 
Code bounds up to 0{y/n) terms Our first result is a variation of theorem [II.4| : 



Theorem 11.16 For every X G (0, 1) there is a constant K{a,d,\) such that for every 
q-DMC W 

N{n, A) > exp {nC{W) - K{a, d, \)y/n) . 

Proof Let P a p.d. on X with C{W) = H{PW) - H{W\P). Let {f,D) a maximal 
(n, A)-code with the property 

\/meM /(m) G r^^^^^, TtD^ < Tr n;)_^_,(/(m)), 
with S = \ In particular (by lemma [11.10| ) 



Tr < exp {nH{W\P) + {Kdy/^6 + KaV2^\ogd)^ 
Let B = J2m(^M -^"i' claim that for all G T^yp^ 

Tr {W^r^B) >r] = min{l - A, AV32}. 
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This is clear if is a code word, and true else, for otherwise we could extend our code 
with the word a;" and decoding operator 

This is exactly as in the proof of theorem [11. 4| . Thus we arrive at 

Tr {{PW fB) > T]/2 
which by lemma |]9| implies the estimate 

TtB> (I - ^) exp {nH{PW) - KdSoV^) ■ 

Choosing 6q = the proof is complete. □ 

The next theorem improves upon our previous converse, theorem [II. 7| : 

Theorem 11.17 For every A G (0,1) there is a constant K{a,d,X) such that for every 
q-DMC W and every {n, X)-code (/, D) 

\M\ < (n + l)"exp {nC{W) +K{a,d,X)^/n) . 

Proof. We will prove even more: under the additional assumption that all code words are 
of the same type P (such codes are called constant composition) one has 

\M\ < exp {nI{P;W) + K{a,d,X)y^) 

(from which the theorem clearly follows). To see this modify the decoder as follows: let 

n' _ TT" n TT" 

— '''-V,PW,5''^rn^'-v,PW,S 



with 6 = Then (/, D') is an (n, i±^)-code: 

= Tr iWf^„.)Dm) - Tr {{Wj^^n) - T^V,PW,5Wfim)Ulpw,5)Dm) 

(the last line by lemma [1I.11[ and the tender operator lemma lOf ) . Now from lemma [11.10 



TtD'^ > (^i^ - exp {nH{W\P) - KdV^SV^) 

> exp {nH{W\P) - Kdy/^5y/n) . 

On the other hand Xlmex ^'m — ^vpws^ hence by lemma 

^ Tr < exp {nH{PW) + Rdd^n) 

and we are done. □ 
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Reliability function For the finite q-DMC W with capacity C{W) the reliability func- 
tion E{R) is defined by 

E{R) — hminf ——\oge^i-a{n,R — 5). 

From the previous section we see that E[R) = for R > C{W). On the other hand define 
the greedy bound 

E^{R, P) = max{min{//i(L, P), ]-Hc{L', P)} : R< L' - L}, 

with the individual exponent (which may be +oo) 

fi,{L,P) = M{D{V\\W\P) : H{V\P) > L}, 
and the collective exponent (which is finite) 

/Xe(L', P) = mm{D{p\\PW) : H{p) < L'}. 

Then we have 

Theorem 11.18 For n > 0, a type P, and R < I{P, W) there exist constant composition 
n-block codes (/, D) of type P with 

\M\ > {n + ly-"'^ exp{nR) 

and error probability 

e{f, D) < 8(n + l)"'^exp(-n£;g(P, P)) 

if n> no{a, d, P). 

Proof. Let L,L' a. pair of numbers with R < L' — L and 

E^{R, P) = min{/ii(L, P), ^/^^(L', P)}. 

It is easily seen that we may assume fii[L,P) > ^fidL' , P). Also that in this case 
L' < H{PW) and L > H{W\P), in particular Eg{R, P) > 0. 

Define A = 8{n + l)"-'^ cxp{—nEg{R, P)) and assume n to be large enough such that 
T] = ^ < 1 — X. Let (/, D) a maximal (n, A)-code with the additional requirement 

Vm e X Tr Drn<{n + 1)"'^ exp(nL). 

We claim that with B = J2meM 

Vx" of type P Tr {W^uB) > rj. 
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For else we could extend our code by an exceptional x" and corresponding decoding 
operator 



The argument is as in the proof of theorem |II.4| : observe that n^^j. |p-j^j^(x"'), and hence 
D', satisfies the trace requirement, and 

Tr (W^.nn;V,ff(.|P)<L(^")) > 1 - (n + l)'^'^exp(-n/ii(L,P)). 

Consequently 

Tr {{PW)^''B) > riin+l)-" 

and by lemma [11.15 



Trfi > {ri{n + 1)'" ~ {n + if exp{-nn^{L' , P))) ■ exp{nL' + nfx^{L' , P)) 
> (n + l)'^exp(nL'), 

from which the estimate on follows immediately. □ 

Corollary 11.19 For < R < C{W) 

E(R) > EAR) = max EJR,P). 

P p.d.: R<I{P;W) 



□ 



Conversely, defining the sphere packing bound 



E,p{R,P)= min D{V\\W\P) 

V channel: I{P;V)<R 



we have 



Theorem 11.20 For R > and n > let (/, D) a constant composition n-block code (of 
type P) with 

\M\ > exp{n{R + 5)). 

Then for the error probability 



e{f,D) > ^exp{-nE,p{R,P){l + 5)) 



if n> noi^a, d, 6). 



Proof. We can directly apply the original idea of Haroutunian (1968 ): consider a chan- 



nel V : X ^ ©(£(7i)) with I{P\ V) < R. From the proof of the strong converse theo- 
11.171 we see that e{f,D) > 1 — | if n is large enough (we assume 5 < 1). I.e. for 



rem 



some message m E A4 and Sm = 1 — -Dm 

Tr(V^(„)5m) > 1 - o 
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Now generally for two states p, a and complementary positive operators S", D (i.e. S^D 
1) one has 



This follows immediately from the monotonicity of quantum I-divergence, theorem [A. 5 
applied to the completely positive, trace preserving map 

a I — > Tr {aS)ei + Tr {aD)e2 ■ 
From this we get by elementary operations 

Applying this to p = V/(m), = W/{m) and 5 = Sm, D = Dm we find 

nD{V\\W\P) + h{l-l) 



Tr{Wf(^rn)Sm) > exp 



1- ^ 

2 



> i exp{-nD{V\\W\P){l + S)) 

if only 6 is small enough (which is no real restriction). Now we choose V such that 
D{V\\W\P) is minimal. □ 



Corollary 11.21 For < R < C{W) (with the possible exception of the leftmost finite 
value of Egp) 

E{R) < Esp{R) = max Esp{R,P). 

P p.d. 

Proof. To apply the theorem we have just to note the continuity of Esp in R, which follows 
from its convexity. □ 



Remark 11.22 The proof obviously also works for infinite input alphabet, if only we have 
a strong converse which indeed we have, by the previous section. 



Remark 11.23 The reader may wish to apply the techniques of the previous proofs to show 
that e{f, D) tends to 1 exponentially for rates above the capacity. The results however yield 
nothing of interest beyond the analysis o/ pCAWA fc JNagaqka (1998^ . 
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HOLEVO bound 



An interesting application of our converse theorem 11.17 is in a new, and completely 



elementary, proof of the HOLEVO bound (theorem |A.16|): 



For a q-DMC W : X ^ G{2.{n)), a p.d. P on X and D an observable on say 
indexed by y, the composition D^. o W : X y is a. classical channel. 

HoLEVO (1973) considered Ci = maxp /j /(P; D^, o W) (the capacity if one is restricted 
to tensor product observables!) and proved Ci < C(W). For us this is now clear, since 
all codes for the classical channel D^, o W (whose maximal rates are asymptotically just 
Ci) can be interpreted as special channel codes for W. 

But we can show even a little more, namely HoLEVO's original information bound 
I{P; -D* o W) < I{P] W) (from which the capacity estimate clearly follows). 

Proof. Assume the opposite, I{P; -D* ° W) > I{P; W). Then by the well known classical 
coding theorem ( [Shannon (194^ ) — alternatively theorem |11.4| which by remark [11. 3| 
generalizes the classical case) there is to every 5 > an infinite sequence of (n, 1/2)- 
codes with codewords chosen from ^p^/2Z for the channel D^: oW with rates exceeding 
/(P; D^: o W) — S. Restricting to a single type of codewords we find constant composition 
codes (of type P^) with rate exceeding J(P; o W) — 26 (if n is large enough). 

As already explained these are special channel codes for W, so by theorem [11.17] (proof) 
their rates are bounded by I{Pn', W) + S (again, n large enough), hence 

/(P; D,oW)-26< I{Pn, W) + 6. 

Collecting inequalities we find 

/(P; W) < I{P- D,oW)< I{Pn] W) + 35. 

But since P„ ^ P by assumption and by the continuity of / in P (see lemma |A^ ) , since 
furthermore 5 is arbitrarily small, we end up with 

/(P; W) < I{P; D,oW)< I{P; W), 

a contradiction. □ 



Open questions 

We left open a number of problems: 

Entangled input Is it possible to exceed the rate C*^^^ = C{(p^) = maxp /(P; </)=„) 
by using block codes where not only product states but arbitrary (entangled) states are 
allowed as "codewords"? We conjecture that the "ultimate" classical information capacity 
of v^*, 

C = limsup-max/(P; y^f") 

n— >oo ri P 

equals C^^^ (compare [Schumacher fc Westmoreland (1997| )). 
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Computations Closely related is the issue of constructing a feasible algorithm to nu- 
merically compute the quantity C^^\ maybe by an adaption of Arimoto's algorithm for 
computing the capacity of a classical channel (cf. ideas of |JNagaoka (1998D ). This could 
be used for experimental tests of whether C^""' = - maxp 

I{P]^T) exceeds C^^\ 



Abstract approach In the proofs so far we relied heavily on the product structure of 
the n-fold channel. For reasons of better understanding of the foundations, as well as for 
having a unified framework for proof, it is desireable to have "abstract" coding theorems 
and converses at ones disposal. What this means is that time structure (blocks, in our 
case even products) is not used: after all the n-fold use of a channel is just a channel 
with larger alphabet. This is e.g. how Fano's inequality is used in weak converses. For 
something closer to our present setting compare [WOLFOWITZ (1964|) , chapter 7. 

• Prove an abstract coding theorem in this spirit! 

• Prove the abstract converse, by exhibiting a usable "packing lemma" , as is known 
in the classical theory. 



Blowing up Prove a blowing up lemma as in the classical theory (commutative 2t) , due 
to [Ahlswede et al. (1976| )! I suggest the following definition: 

Let 21 = £(7^) a C*-algebra with q = dime 21, and 11 G 21®" a projector. Define the 
blow-up of n as 

rn = 1. c. supp{A(i)nA^i) : l <i <n, A* A < 1} 

where v4(i) = ®A® The l^^ blow-up of H is PU, defined as 

r'n = 1. c. supp{A(/)nA^^) : / c [n], |/| = 1, Ae 21®', a* a < 1} 

where A(^j) = (g) A (in the right order). 

In loose words: F'll is the least common support of all images of 11 under all quantum 
operations confined to / positions (factors in the tensor product). 

Lemma 11.24 The blowing up operation has the following properties: 

1. T^Tl is a projector. 

2. is the I -fold iteration ofT. 

3. ForO<l< I' one has H < F'n < r''n. 



4. TrF'n < (gn)' -Trn. 
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Proof. Points (1) and (3) are obvious. For (2) and (4) write 11 = J^neP ^ ^'^^ ^ ^ 
(necessarily orthogonal) minimal idempotents. Clearly Trll = |P|. Then 

r'n = 1. c. supp{A(/)7rA^j) : TV eP, I C[n], \I\ = l, Ae A* A < 1} 

and the supporting subspaceQ of this is 

J2 span{A(/)|V^) : / C [n], \I\ = 1, Ae 2l®\ A* A < 1}. 

But 21 has a linear basis {Ai, . . . , Ag) which produces by tensor products a basis of length 
q'- of 21®'. This shows (2), and since there are at most many / C [n] of cardinality I we 
get (4). □ 

Conjecture 11.25 Let W a fixed q-DMC, niw the smallest non-zero eigenvalue of the 
Wx, G A"", and B a projector. Then 

Tr {Wx^V'B) > $ (^<^-\Tt (Wx^B)) + J-^^ , 

with a = r ^ ^ where c > is a universal constant and $ : M — > [0, 1] is the 

V- Inmiy 

Gaussian distribution function: = e~* ^"^dt. 



Among the possible applications would be the transition from weak to strong converses 
(after Ahlswede & Dueck, cf. [Csiszar fc Korner (19811 ), chapter 2.1). 

Reliability function We proved the sphere packing bound and a lower bound on the 
reliability function which at least shows its positivity for rates below the capacity. For 
the pure state channel this is matched by random coding and expurgated lower bounds 
of BuRNASHEV fc HoLEVO (1997 ). Unfortunately in this case our sphere packing bound 



is trivial! 

We leave as open problems: the proof of a random coding lower bound in the general 
case (which should enable us to determine the reliability function above a critical rate), 
and (at least in the pure state case) to find a suitable modification of the sphere packing 
bound (as the present formulation does not take into account possible noncommutativity). 



^In Hi e • • • e Hm, which we think of 21 = 0™ ^ £,(n^) to hvc on! 
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Chapter III 

Quantum Multiple Access Channels 



The multiway channel with s senders and r receivers in classical information theory was 
already studied by Shannon (1961|) . [Ahlswede (1971|) and [Ahlswede (1974aD first 



determined its capacity region. For a good overview on multiuser communication theory 
one should consult Pl Gamal fc Cover (1980|) . In the present chapter we will define 



the corresponding quantum channel (after recent work by [Allahverdyan fc Saakian] 



(1997b| )), extending the results of the previous chapter: we bound the capacity region (in 



the limit of vanishing error probability), and — for the multiple access channel, i.e. one 
receiver — we are able to prove the corresponding coding theorem. 

Quantum multiway channels and capacity region 

This is the simplest situation of multi-user communication in general: consider s indepen- 
dent senders, sender i using an alphabet Xi, say with an a priori probability distribution 
Pi. We describe this by the quantum state (jj = "^^.^p^^ Pi{xi)xi on the commuative C*- 
algebra Xi = CXi generated by the Xi which are mutually orthogonal idempotents (to 
distinguish these as generators of this algebra we will sometimes write [xi]). The channel 
is then a map 

W : Xix ■■■ X Xs-^ 6(2)) 

with a (finite dimensional) C*-algebra 2), which connects the input (xi, . . . ,Xs) with the 
output By linear extension we may view W as a. completely positive, trace 

preserving map from ® ■ ■ ■ ® to 2)*. The receivers are modelled by compatible 
*-subalgebras 2)j (see appendix 0, section Quantum systems). 

If all the Wxj^...xs commute with each other (hence have a common diagonalization) 
the channel is called classical. 

For fixed a priori distributions we have the channel state 

1= Pl{,Xi)---Ps{Xs)[xi]®---®[Xs]®Wx,...x^ 



on Xi ® ■ ■ • ® ® 2). 
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For a subset J C [s] denote Pj = <S)ieJ ^ J) = Y\i(^j i 

X{J) = ]\^^.J^^ (similarly X(J) = 

Further define a reduced channel PjcW : X{J) S(2}) by 

PjcM/ : ix^\^ e J) ^ 5^ Pjc(x,|2 G J'^)W.i...... . 

Note that 

Trx(j-)7 = ^ Pjixi\i e J)[xi\t eJ]0 (PjcW^)(^,|,ej) . 

Vie J: XiSA'i 

An n-block code is a collection (/i, . . . ,fs,Di,... ,Dr) of maps : — A"" and 
decoding observables Dj C 2)®", indexed by M'l x ■ ■ ■ x A^'^ D A^i x ■ • • x TW^. There 
are r (average) error probabilities of the code, the probability that the receiver j guesses 
wrongly any one of the sent words, taken over the uniform distribution on the codebooks: 

e,{fu... ,fs,Dj) = l- Yl Tr (l^«"(/(mi),...,/(m,))D,„J. 

We call (/i, ... , fs, Di, . . . , Dj.) an {n, A)-code if all ej{fi, ... , /«, Dj) are at most A. 

The rates of the code are the Ri = -log|A^j|. A tuple (i?i,... ,Rs) is said to be 
achievable, if for any A, 5 > there exists for any large enough n an (n, A)-code with z-th 
rate at least Ri — 6. The set of all achievable tuples (which is clearly closed, and convex 
by the time sharing principle, cf. [CsiSZAR Sz KoRNER (19"8l| ), lemma 2.2.2) is called the 
capacity region of the channel. 



Outer bounds 



In the case r = 1, s = 2 the following theorem was already stated by [Allahverdyan 
S |aakian (1997b]) , who also gave hints on the proof. 



Theorem III.l (Outer bounds) The capacity region of the quantum multiway channel 
is contained in the closure of all nonnegative {Ri, . . . , Rg) satisfying 

VJ C [s],j G [r] R{J) = J^Ri <Y.^uK (X(J) A2),|X(J^)) 

ie J u 

for some channel states 'ju (belonging to appropriate input distributions) and g„ > 0, 

J2u9n = 1- 

Proof. Consider any (n, A)-code (/i, . . . ,fs,Di,... ,Dr) with rate tuple (i?i,... ,Rs). 
Then the uniform distribution on the codewords induces a channel state 7 on the block 
(Xi ■ ■ ■ Xs2))®". Its restriction to the u-th copy in this tensor power will be denoted 7„. 
Let j & [r], J G [s]. By Fano inequality in the form of corollary |A.25| we have 



i/(X®"(J)|2)®"X®"(J'=)) < 1 + A ■ nR{J). 
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With 



i/(X®"(J)|2)®"X®"(J'=)) = i/(X®"(J)) - /(X®"(J) A2)f"X®"(J'=)) 

= nR{J) - /(X®"(J) A 2)f"X®"(J")) 



we conclude (with subadditivity of mutual information, corollary [A.18|) that 



:i - \)R{J) < ^ + ^/-,(X®"( J) A 2)f"X®"(J^)) 
1 1 " 

<- + -5^/,„(X(J)A?),X(J^)). 

u=l 



□ 



Remark III. 2 In the case of classical channels the region described in the theorem is 
the exact capacity region (i.e. all the rates there are achievable), as was first proved by 
Ahlswede (1971\ ) and [Ahlswede (1974d{ ). 



Remark III. 3 The numeric computation of the above regions is not yet possible from 
the given description: we need a bound on the number of different single-letter channel 
states one has to consider in the convex combinations. For the multiple access channel 
(r = 1) this is easy: by Caratheodory 's theorem s will suffice. For general r there are 
also classical bounds, which carry over unchanged to the quantum case (since the quantum 
mutual information has properties similar to those of classical mutual information) : r{2'^ — 
1) always suffice, as was observed 6j/ [Ahlswede (1974bi) . 

Coding theorem for multiple access channels 

With the notation as before for a quantum multiway channel W with one receiver we 
have 

Theorem III. 4 An s-tuple {Ri, . . . ,Rs) is achievable (i.e. there is an infinite sequence 
of {n, Xn) -codes with A„ ^ and rate tuple tending to . . . , Rg) ), if and only if it is in 
the convex hull of the pairs satifying (for some input distributions which induce a channel 
state 7 ) 

VJ C [s] R{J) = < /^(X(J) A2)|X(J")). 

We shall prove this only in the case s = 2, the reader should have no difficulty to see the 
extension to larger numbers. In this case the conditions reduce to 



Ri + R2< 1(2} AX1X2), 
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That these are necessary is of course theorem |IIL1| . For proof of the achievabihty it is (by 
the time sharing principle) sufficient to consider an extreme point of the region described 
by the above inequahties for a particular channel state. It is easily seen that w.l.o.g. 
-Ri = /(Xi A 2)), R2 = /(X2 A That this point is achievable follows immediately 

from theorem |I V. 14\ and the following theorem, applied with ^1 = /(Xi A 2)) + 5 and 
R2 = /(X2 A Xi2)) + 6. 

Theorem III. 5 (Cf. [CsiSZAR fc Korner (198]\ ), proof of theorem 3.2.3) Let A, 5 > 0, 
W a quantum multiple access channel with two senders, and Pi probability distributions 
on the sender alphabets Xi. Define the (?h}-source (see chapter section Correlated 
quantum sourcesj (Xi, X2, 2), x X2 x P,P) on Xi ® X2 ® 2) by P{xi ® X2 ® vr) = 
Pi{xi)P2{x2)qTT\xiX2! where P is a set of pure states on 2) and the qn\xix2 ^ '^'^^ ■^'wc/i 
that WxiX2 = X^TreP '?'r|a;ia;2^ (^-9- diagonalizc all Wxj^x2 ^^'^ t'^ke P to be the set of all 
eigenstates occuring. 

Then from any {n,X)-coding scheme (gi,g2,D^^^) with quantum side information at 
the decoder for this source, with rates Ri, R2, one can construct an {n, 4A) -coc/e (/i, /2, D) 
for W with rates Ri > H{Pi) — Ri — 5, provided n > riod^il, |^2|, 

Proof. Let gi : ^ AAi and g2 ■ X2 ^ M.2 the encodings, D^^^ the observable on 
CMi ® CM2 ® 2) indexed by X^ x X^. Observe that it is of the form 

^x'lx^ ~ "^1 ® "^2 ® Dmim2,x^xl^ ■ 

mi^Mi,m2&M2 

Define for every (mi, 7712) G M.i x M.2 

•Am^ = g^^imi}, Bm2 = g2^{m2}. 

Assume that the Ami, ^m2 consist of words of single type (otherwise one modifies the 
coding by also encoding the type of the sequences, increasing the rate negligibly, in the 
asymptotics). 

Construct now codes (/(^i"^^)^ ^.^(^1^2)^ ^(^i^^)^ follows: 
^(mima) _ .^^^^ ^ ^^(mim^) _ .^^^^ ^(mima) observable ou 2) indexed by Ami X I3m2 
witv, n^'"!'"^) ^ 7-)/ 

Willi J-^x^xl^ — ^mim2,x^x^- 

As in ICsiSZAR fc: K6RNER (1981| ), pp.272 we can see that for the error probabilities 



J2 E A"(^™JA"(i3r.Je(/(™),/f < e{g,,g2,D^'^) 



and again copying from |CsiSZAR fc Korner (1981| ) we find that there is one of them 



having e(/^™^"^\ /2^™^"^\ ^{"i™^)) < 4A and rates Ri > H{Pi) - R, - 6, if n is large 
enough. □ 
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Open questions 

Random coding The major drawback of the above method of proof is that it allows 
no direct code construction for every point in the capacity region, as does the proof of 
[Ahlswede (197"4a|) (we needed to invoke the time sharing principle). It seems that this 



approach is no longer possible if there are two or more receivers present. The above outer 
bounds however we conjecture to be the correct ones (by formal analogy with the classical 
case). A proof of the corresponding coding theorem would be highly desireable, possibly 
by a cleverly adapted random coding argument (see the proofs of the quantum channel 
coding theorem by [HoLEVO (1998a|) and [Schumacher fc Westmoreland ( 19971) ). It 



should be clear that such a proof is far more natural than the one we presented here. 
For a proof of the quantum multiple access channel coding theorem which does not rely 
on code partitions and reduction to a source coding problem but instead uses iterated 
"slicing" of the rate with random code selection, see [Winter (1998a|) . 
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Chapter IV 

Quantum Multiple Source Coding 

Having investigated in cliapter | tlie problem of quantum source coding we now turn to the 
problem of (independent) source coding of possibly dependent sources. In the first section 
we will introduce the mathematical model, and venture then to analyze this model as far 
as possible (which, as it will turn out, is not very much): we will restrict ourselves mostly 
to double sources, proving some general bounds and presenting characteristic examples. 
Then we study the particular case that only one of the sources is quantum, the others 
being classical. We are thus led to consider the problem of coding with side information, 
which for this kind of source we can in part solve. In general however there is to be 
distinguished between multiple source coding and coding with side information. 

Correlated quantum sources 

A multiple (s-fold) quantum source is a tuple (2li, . . . , 2ts, P, P) of C*-algebras Stj (with 
us: finite dimensional), a finite set P of pure states on 21 = 2li ® ■ ■ ■ ® 21^ and a p.d. P 
on P. 

The average state of the source is the state PP on 21, its marginal restricted to 21®^ = 
(g).g^2li is denoted PP\i. 

We call the source classically correlated if all the states vr G P are product states 
with respect to 2li, . . . ,21^: vr = tti ® ■ ■ ■ ® vr^, vTj G 6(2tj). In this case we obtain for 
each J C [n] a multiple source ((2lj|j G J), P| j,P) by restricting the vr G P to 21®'^, i.e. 
replacing tt by 7r|j. Always in this situation we assume w.l.o.g. P = Pi x • ■ ■ x P^ 

If in particular k of the 21^ are classical (i.e. commutative), / are fully quantum (i.e. 
full matrix algebras) and the remaining m are arbitrary ("hybrid"), we speak of a c^q^k^- 
source. 

An n-block coding scheme with quantum encoding for a multiple quantum source 
(2ti, . . . , 2ls, P, P) is a tuple {eu, . . . , Eg*, 5*) with quantum operations 

5, :£(/Ci ® • ■ ■ ® JCs), ^ 2tf," ® ■ ■ ■ ® 2lf,". 
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An n-block coding scheme with arbitrary encoding for a classically correlated (!) mul- 
tiple quantum source ... , 21^, P, P) is a tuple (^i,,, ... , e^*, 5*) with 

£i* :P" — > 6(il(/Cj)) mappings and 

:I1(/Ci ® ■ ■ • (g) /Cs),^ — > 2lf " (g) ■ ■ • (g) 2lf " a quantum operation. 

Writing = £1* (g ■ ■ ■ (g £5* we define the average fidelity and average distortion of 
the scheme {eu, ... , e^*, 5*) as expected: 

F{eu,... ,es.,S.)= ^ P"(7r")-Tr ((5,£,7r")7r"), 

7r"GP" 

If all 2tj are fully quantum, say 2tj = £(7^j), we can define the entanglement fidelity by 

. . . , ^s*, 5*) = Tr {{{6,6, ® id)^f ^)^|^) . 

Quite obviously theorem for these quality measures is still valid. It should be clear 
also what we mean by (n, X)p-, {n, X)i)-, and {n, A)Fe ^coding schemes. 

The rate tuple . . . , Rg) of the coding scheme is defined by i?, = ^ log dim /Cj. A 
tuple . . . ,Rs) is called (quantum, F)-achievable if there is a sequence of (ra, A^)^- 
coding schemes with rate tuples converging to (Pi, . . . , Rg) and A as n — > 00. The 
set Rq,F of all (quantum, P)-achievable rate tuples is called (quantum, F)-rate region. 

Analogously (arbitrary, P)-, the same with P, and (quantum, Pe)-achievability are 
defined, with their respective rate regions Ra^F, R.q,D5 Ra,D ^i^d Rq,-Fe- 

It is clear from the definition that the rate regions are closed, convex (by the time 
sharing principle) and right upper closed (increasing some of the Pj does not leave the 
rate region). Also we have the following quite obvious inclusions: 

Rq,Fe C Rq^J? C Ra,F 
Rq,D Ra,D 

Note that the different rate regions depend on the ensemble (P, P), only Rq,Fe is obvious 
to depend only on the average state PP. For the others we will present evidence that 
they do in fact depend on further properties of (P, P) besides PP. 
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Some general bounds Consider first a double source, quantum encoding with average 
fidelity: 

Theorem IV. 1 Let (2li, 2I2, P, P) a double quantum source and R2) a {quantum, D)- 
achievable pair. Then with the average state PP on 2t = 2li (S> 2I2 

Ri + R2> //(2ti2l2), Ri > i/(2ti|2t2), R2 > i/(2l2|2li). 

Proof. The first inequality follows from the converse to source coding, in the generalized 
form of theorem [1.22| . For the second, consider an (n, A)£,-coding scheme {eu,e2*,S^) 
with quantum encoding which has rate pair + e, R2 + e). Modify the coding scheme 
as follows (for n large enough): 

2li encodes just as before, but 2I2 uses Schumacher's data compression to encode 
his part in H(Ql2) + e qubits per symbol and with D < The decoder first "unpacks" 
the signal from 2I2 and then applies 2l2's previous encoding £2*- After that she applies 
her previous decoding 6^,. Let us estimate the average trace norm distortion of the new 
scheme: by the non-increasing of || ■ ||i under quantum operations and triangle inequality 



it is at most Thus from theorem |]2| it follows that Ri + H{Ql2) + 2e > H{Ql), 



and since e is arbitrarily small we get the second inequality. The third one is exactly 
symmetrical. □ 

Example IV. 2 (Cloned wheel) Consider the c°g^-source (2ti,2t2, P,P) given by 2li = 
2I2 = ii(C^), and P is equidistributed on 

P = {|00)(00|, |11)(11|, |++)(++|, I— )(— |}, 

where {|0), |1)} is an orthonormal basis of and |+) = ^(|0) + |1)), |-) = ^(|0)-|1)). 
So the average state of the source is 

pp = \ mm + + + 1— )(— I) 

and clearly the marginals are 

P^Wi = -PP|a2 = 

Since each of the sent pairs is clearly invariant under exchange of 2ti and 2I2 we see that 
so is PP, i.e. PP is supported on the three-dimensional symmetrical subspace Sym2(C^) 
of ® C^. In fact, an orthonormal basis of Sym2(C^) is given by the triplet Bell states 

l$^) = ^(|00) + |ll)) 

l<^>-) = ^(|oo)-|ii)) 
l^+) = ^(|oi) + |io)) 
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and it is readily checked that 



Thus H(PP) = 3/2 and it is clear from the previous theorem |IV.1| that with quantum 
encoding one gets Ri + R2> 3/2, Ri, R2 > 1/2: 

Rq,^ C {{Ri, R2) : i?2 > 1/2, Ri + R2> 3/2}. 

This might appear strange: naively, in the coding 2t2 (say) is unnecessary, since its 
state is identical to that of 2li (which would mean that the uncertainty of the state of 2t2 
given that of 2ti is zero). So let's try the following coding scheme: 2I2 transmits nothing, 
whereas 2li transmits his state vr faithfully using one qubit. But the task of the decoder 
is to reconstruct the total state, i.e. n ^ n, which is clearly impossible by the no-cloning 
theorem. So we see that there is indeed a sense in the above inequalities. 

However, in the model with arbitrary encoding, the first encoder can replace his state 
TT by 7r(8)7r and code it into (asymptotically) 3/2 qubits per symbol using Schumacher's 
quantum coding. Hence 

R,,^ = {(i?i, R2) : Ri, R2 >0,R, + R2> 3/2}, 

and thus we learn: 



In general Ra,F and Rq,F are different. 



Remark IV. 3 In the proof of theorem |/K 1\ a coding theorem ('Schumacher 'sj was used. 



Thus, to prove lower bounds for more than two sources, we need some coding theorem for 
correlated quantum sources. 

Interestingly we can prove directly lower bounds on the resources needed for schemes with 
quantum encoding having high entanglement fidelity. We employ for this the following 
concepts from [SCHUMACHER (1996|) , Schumacher fc Nielsen (1996|) : 



For a quantum operation (p^, : £,{H)^, = 21=,, ^ 21=,, and a state p on 21 choose a 
purification \E'p of p on the extended system 21 ® 9^ (for reference system). The entropy 
exchange^ is defined as 

Se{p; (p*) = H {{lp^ (g) id«H=^)*p) 
and Schumacher (1996 ) shows that it does not depend on the purification chosen. It 



can be seen as a measure for the quantum information exchange between system and 
environment. 

Thus it is natural to define the coherent information (after [Schumacher fc Nielsen 
[T996D ) as 



From [Barnum et al. (1998| ) we take the following lemma, which is a direct consequence 



of the quantum Fano inequality from [SCHUMACHER (199^ ). 



^We adopt the name for this following Schumacher (1996) and general physical fashion 
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Lemma IV. 4 Let quantum operations on the system p a state on 2t and denote 

d"^ = dime 21. Then 

H{p) < Up; f*) + 2 + 4(1 - F,{^, o if,)) logd. 

□ 

We are now ready to prove 

Lemma IV. 5 (Weak subadditivity of coherent information) Let p a state on 
2t2 with marginals pi,p2, o,nd (fu,y:>2* quantum operations on 2li,2l2, respectively. Then 

hip; fi* (S) ip2*) < Ie{pi] V^i*) + H{p2). 

Proof. Introducing environment systems (£2, pure "null" states Ti on €1, T2 on £2 and 
unitaries on the underlying Hilbert space of 2li (£1, 2I2 ® €2, respectively, such that 

(^i.(a) =Trc, {Ui{a®n)Ul)) 
(^2*(a) =Tr£, {U2{o®T2)U;)) 



(which is possible by Stinespring's theorem |A.1| ). Now what we have to prove (with 
= ® 9^2) is 

H{{'fu®'^2*)p)-H{{'f)u®'^2*®'^<^^*)^ p) <-f^(v5*pi) --f^(((/5i* (g)id2t2* ®id«K*)\I^p) +/7(p2)- 
Defining operations 

Eu = {Ui-u - Ul) ® idgj* ® id2i2* ® idg^^ 
E2* = idgj* (g) idsii* ® (f^2- u ■f/2) ® id«H^, 

on €1 ® 2ti ® (S2 ® 2t2 ® 9^, and the state a = Ti ® T2 ® p we can write this as 

He.^E.A'^M + i/E,,.(2li2l29l) < HE,,E,A'^i'^2'yi) + He.A'^i) + H,{%2). 

Notice that all the states here are pure! Thus by theorem |A.12 



He,.E,A'^i'^2) = HE,,E2A'^l^2'n) 
HE,.E,A'^l'2i2^) = He,,E,A^1^2) 
HE,A^1^2'y^) = He,,.{^1^2) 

= He.A^i) = He.^e^A^i) 

(the last step since Ei^al^^ is pure and i?2* acts trivially on (£1*), and our inequality 
transforms to 



HE,,E,A^l'^2'yi) + He,,E,A'^i) < He„E,A^1^2) + He„.{%i) + He,A'^2)- 
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Here with strong subadditivity of entropy (theorem |A.9|) the left hand side can be esti- 
mated by 

and we are done if we can prove that 



But -Ei*cr|2ii2i2ei5H is pure, so again by theorem |A.12 

He.A'^i) = He,A'^2<^i'^). 
And since £'2* acts trivially on ^u^i^ we have 

He,.^e,A^i':i^) = He.A'^i'K) 

which renders our last inequality equivalent to 

HE.A^i'yK) - He,A'^2) < He,A'^2^i'^), 

and this is the triangle inequality, theorem [A.13|. □ 



Remark IV. 6 Subadditivity 

which is by Ie{p2] < H{p2) stronger than our lemma, and which one would expect of 
an information, actually fails; see [Barnum et al. (199^ . 



Theorem IV. 7 Let (2ti, . . . ,2ts, P, P) a multiple quantum source with Stj = and 
{Ri, . . . , Rs) a {quantum, F^) -achievable tuple. Then 

V/ C [s] > Hmi)\Qi{F)) = H{PP) - H{PP\i.). 

iei 

Proof. Let an (n. A) -co ding scheme {eu,... ,es^,6^:) with rate tuple , Rg) be 

given. Denote d = dimTij. 

We may think of as acting on 21^" by embedding the coding space il(/Cj)*. Thus we 
can apply for / C [s] lemma fV^ to = ^i*^(p2* (with ipu = <S>iei^i*^ ^"i* = ®iG/'=^") 
and ijji, = 5*, and obtain 

nH{PP) < /e((PP)®"; v^i. ® + 2 + 4nA log d 

< Je((PP|/)®"; ^A + nH{PP\A + 2 + 4nA \ogd 
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(using weak subadditivity of the coherent information). Since trivially 



*) <n^Ri 



we get the theorem in the limit of n — > cxd and A — > 0. □ 

The following example shows that our nice theorem [1V.1| is too weak, at least for nonclas- 
sically correlated sources. At the same time it shows that also theorem [IV. 7| is too weak. 



Example IV. 8 (EPR source) Consider the source (2ti, 2I2, P, P) with 2li = Sts = 

£{C^), P = {|$+)($+|, I $-)($- 1 } (two of the Bell states) and P equidistributed on 
P. Clearly 

PP = + \\^-){^-\ = \mm + ^iii)(iii 

and both marginals equal Theorems [IV. 1| and [IV. 7[ both give only the lower bound 
Ri + R2 > 1. But we will prove that in fact 



Rq,Fe C Rq^i? C {(Pi,P2) : -^1,-^2 > 1/2}- 

To see this let an (n, A)jf-coding scheme (£i*,£:2*,5*) be given with rate pair (1,^2), the 
first encoder being the identity. Now imagine that two people want to use this scheme 
to transmit information: the sender encodes 0-1-sequences as sequences of |$^)($^| and 
|$^)($^|, giving the according shares of these entangles states to the two encoders. The 
receiver measures the decoded states in (the tensor power of) the basis {!$'''), |$~)}, call 
the corresponding observable D. The transmission rate of this system clearly is 1, with 
average error probability bounded by A: 



Sender 



tt" G P" 



id 



£2* 



Ri = 1 



R2 



D 



Receiver 



Allowing that the sender cooperates with the encoder 52*5 and the receiver with the 
decoder S^, can only increase the transmission rate. We may describe the new situation 
in a different, equivalent way: the two encoders get the n^^ power of the maximally 
entangled state |$"'")($+|, while the second encoder, before performing his £2*? does the 
message encoding (!) for the sender. This is done with the help of the phase flip operator 
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on C^, as it is readily checked that {id^f3)\^'^) = |$^). But here the first encoder becomes 
superfluous: thus we can assume that initially sender and receiver share n maximally 
entangled pairs |$+)($"'"|, and the second encoder (viz., the sender!) transmits nR2 
qubits to the receiver. This is exactly the situation of superdense coding, invented by 
[Bennett &i Wiesner (1992| ): and it is well known that the maximal transmission rate 
in this situation is 2R2, forcing R2 > 1/2 in the limit of n — 00, A — > 0. Of course 
symmetrically for Ri. 

We can note the two lessons we learned: 



Theorem IIV.II is too weak. 



Theorem IV. 7 is too weak 



The last example shows the difference between average and entanglement fidelity: 

Example IV. 9 (Cloned cross) Consider the source 2I2, P, P) with 2li = 2I2 = 
£(C2) and P equidistributed on P = {|00)(00|, |11)(11|}. Clearly 

PP = ^|00)(00| + 1|11)(11| 

with both marginals equal to |l. A natural purification of this source would be by the 
GRZ-state (|000) + |111)), invented by |GREENBERGER et al. (19901) to extend Bell's 
theorem to multi-party entanglement. 

Since the average state is the same as in the EPR source we have 

Rq,^, C {(^1,^2) : RuR2>l/2}. 
On the other hand it is obvious that 



R 



{{Ri, R2) : i?2 > 0, i?i + i?2 > 1}. 



It is clear from theorem PV.1| that R1 + R2 > 1 is necessary (even with arbitrary encoding). 
And also one sees easily that Ri = 1, R2 = is (quantum, F)-achievable: 2t2 sends 
nothing, whereas 2li transmits his qubit faithfully, the decoder has just to copy it to 
obtain the initial joint state (this is only possible because the two alternative states sent 
by 2ti are orthogonal!). 

Again collecting our lessons: 



HaF depends not just on PP. 



In general R„ p and Rq.Fe are different. 



Concluding this section we may state that the pleasing situation of chapter | has 
completely dissolved: all three rate concepts differ, and (except for entanglement fidehty) 
the rate region depends not only on the average state. 
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Classical source with quantum side information 

In this and the following section we will turn to the study of a restricted kind of multiple 
source, namely c^'/i^-sources, and we will be able to complement the above bewildering 
picture by some positive results (coding theorems). 

Theorem IV. 10 (Code partition) (Cf. [CsiSZAR fc Korner fl98]\) , proof of theorem 



3.1.2) LetW : X ^ 6(2)) a q-DMC, P a probability distribution on X, X,6,r]> 0. Then 
for n > nQ{\X\, dimTi, X, 6, rj) there exist m — 1 < exp{n{H{P) — I{P;W) + 36)) many 
{n, X) -codes with pairwise disjoint "large" codebooks 

\Ci\>exp{n{I{P;W)-26)), 

such that P"(A'" \ U^~^ d) < 7]. 

Proof. Choose a > such that P"{Typ^) > 1 — ri/2 and n large enough such that for 
every A C Typa with P^{A) > ri/2 there is a (n, A)-code with codebook C d A and 
\C\ > exp{n{I{P; W) — 26)) (by the coding theorem |11.4| ). Now choose such a codebook 
Ci C Ai = Ty^p^a inductively Ci (Z Ai = ^j_i\Cj_i until P^{Ai) < 7]/2, say for i = m. 
Obviously the codebooks are disjoint, and the rest has weight less than rj. It remains to 
estimate m: 



m— 1 

(m - 1) ■ exp(n(/(P; W) - 26)) < |C,| < IT^^p,,, 



i=l 



and since by lemma \Typo\ < exp{n{H (P) + 6)) for large enough n we get the state- 



ment . □ 

Consider the problem to encode the classical part of a c^/i^-source (X = CX, 2), x P, P), 
using the quantum source as side information at the decoder: 

An n-block coding scheme with quantum side information at the decoder is a pair 
(/, D), with a mapping / : X"^ — > Ai and an observable D on CA^ (8> 2) indexed by X. 
Its error probability (averaged over P) is 

e(/, D) = l- ^"(^"' ((/(^") ® ^")^-")- 

The proof of the following theorem goes back to an idea of Ahlswede (1974b| ): 



Theorem IV. 11 (Rate slicing) (Cf [CsiSZAR Korner (lOSlj ), theorem 3.1.2) For 
every A, 5 > and c^h^ -source (X = CA", 2), A" x P, P) there exists an n-block code (/, D) 
with quantum side information at the decoder such that 

- log \M I < H(Xm + 36, and e(f, D)<X 
n 

whenever n > no{\X\,dim7i,e,6). Furthermore, the observable may be modified to the 
operation D'^ = Ti cm ° Dtot* from (CtVI)* (g> 2)* to (CA"")* ® 2)* which satisfies 

J2 P"(x",7r")||x"®7r"-D:(/(x")®7r")||, < v^+A. 
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Proof. Define the q-DMC W : X ^^^hy 

(with the marginal distribution Px of P on X). Choose rj < X in theorem |IV.10| , and n 
accordingly large such that codes {gi,Di), i G [m — 1] like in that theorem exist. Assume 
that their message sets coincide with their codebooks and that Qi is the identity. 
Define now 

i if x" G Ci , 
m else. 



fix' 



The decoder reads i = f{x^) and if z 7^ m uses Di to recover a;"' from the side information: 
formally, D consists of the operators [i] ® -Die for i E [m — 1], c G C'^, and [m] ® 1. That 
this has the desired properties is easily checked. Now for the second part: observe that 



D::[j]^p^\J^fcM®VWcPVWc ifj< 
\ [m] ® p if J = 



m, 

if j = m. 

By the tender measurement lemma |3| and note |]6| the assertion follows. □ 

Remark IV. 12 The decoder either says "don't know" (with probability at most A over 
the source distribution P"'), or decodes correctly with maximal error probability A. 

Corollary IV.13 For the c^h^-source (X = CX,^,Xx P,P) the pair {H {X\^) , H {^)) 
is {quantum, F) -achievable. 



Proof. Combine theorem |IV.11| with Schumacher's quantum coding. □ 
Consider now the c^/i^-source 

{{Xi = CX^\t G [s]),^,Xi x---xXsXP,P). 

An n-block coding scheme with quantum side information at the decoder for this is 
a (s + l)-tuple (/i, . . . ,fs,F)) of mappings fi : Xl" Aii and an observable D on 
Xi ■ • • ® Xs (8> 2), indexed by XJ^ x ■ ■ ■ x Af". Its error probability (averaged over P) is 

e(/l, ...,/„ = 1 - Yl ^(^1 ' • • • ' iiflK) ® • • ■ ® fsK) ® p)D.n....n). 

Theorem IV. 14 With the notation above and X,S > there exists an n-block coding 
scheme with quantum side information at the decoder with 

VJ C [s] -^log|A<j| < if(X(J)|X(J^)2)) + |J| -3(5 
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and error probability at most X, whenever n > nQ{\Xi\, dim?-^, A, 6). 
Moreover for the operation D'^ = Tr c(a4ix-xa^,) ° Dtot*, 

: C{Mi X ■ ■ ■ X Ms). 2)r ^ X ■ ■ ■ X X^), ® 2)f", 

it holds that 

. . . , x:, p) II K • . . x^] ® p - z^: ( [A (x^) . . . /, (x:)] ® p) lb < A. 

Proof. Only the second statement is to be proved. We use induction on s, the number of 
sources: s = 1 is clear by direct application of the rate slicing theorem [1V.11| . For s > 1 it 



is sufficient (by the time sharing principle) to consider only extreme points of the region: 
thus w.l.o.g. 

-loglA^il </J(Xi|2)) + 3(5 

n 

-log\M2\ <i/(X2|Xi2)) + 35 
n 

-\og\Ms\ < H{Xs\Xi - ■ -Xs-m +35. 
n 

The proof that these are indeed the extreme points is in the section Extreme points of 
rate regions below. 

Now by induction we have an {n, A/2)-coding scheme for the source 

{{Xi = CXi\i e [s - l]),Xs ^, Xi X ■ ■ ■ X Xs-i X (Xs X P), P), 

call its decoding operation D[^. By rate slicing we also have an (n, A/2)-coding scheme 
for the source (Xs,2),A's x P, P) with side information at the decoder, call its decoding 
operation Di^^:. Then the concatenation = D[^, o (id ® -D2*) of ^^e two processes 
obviously has the desired error properties, and it is readily checked that it has the stated 
form. By tracing out 2)®" we recover the observable D. □ 

Remark IV. 15 The theorem shows that not only we can use quantum side information 
"just like" classical information to improve compression but also that we can do so with 
almost not disturbing the quantum information. 

Corollary IV. 16 For the above source all tuples (Pi, . . . ,Ps,if(2))) satisfying 

yjc[s] 5^p, >p(x(j)|x(j^)2)) 

are {quantum, F) -achievable. 
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Proof. Combine theorem |IV.14| with Schumacher's quantum coding. □ 



We close this section with a converse to these coding theorems: 

Theorem IV. 17 Still with the above source all {quantum, F) -achievable rate tuples of 
the form ... , i?^, iJ(2))) satisfy 

VJ C [s] ^ H{X{J)\X{r)^). 

Proof. Otherwise we could by theorem [111.5| construct an infinite sequence of transmission 
n-block codes for the quantum multiple access channel 

W -.XiX-'-xX, — > e(2)) 

(xi, ... , I — > — P{xi ...Xs, vr) 

which violate the outer bounds of theorem |111.1| . □ 

Quantum source with classical side information 

The simplest instance of the problem considered in the previous section is the case of 
the c^g^-source. There we solved the problem of compressing the classical source with 
the quantum information as side information at the decoder, which gave us one extreme 
point of the rate region of the multiple source coding problem. It is natural, therefore, 
to consider the complementary problem of compressing the quantum source, using the 
classical information as side information, preferably only at the decoder: this would give 
us another extreme point, presumably completing the determination of the rate region of 
the c^g^-source (if the bounds of theorem [IV. 1| are already the correct ones). 

An n-block quantum source coding scheme with side information at the decoder for 
the c^g^-source (X = CX, 2) = ii(7i), A* x P, P) is a pair {e^:, 5*) with a mapping 

: P" ^ &iSlilC)) 

and a family of quantum operations 

6,:X^x £(/C), ^ 2)f". 
Quantum and arbitrary encoding are as before, also rate, and the average fidelity is 
F = F{e,,6,)= Yl P"(a;",7r")-Tr((5,(x"K7r")7r") 

(average distortion _D(e^,,5*) similarly). 
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The limiting rates Rg p{\) and Ra^p{X) are defined obviously. What can we say about 
them? From theorem IV. 1| we get at least 




In fact even R^p{\) > H{^\X) for A G (0, 1): otherwise we could (with compressing 
X classically, e.g. by ignoring all non-typical sequences) compress the total source X2) 
with asymptotically at most ^^^^^(A) + H{X) < if(2)|X) + H{Tj = H{X^) qubits per 
symbol, contradicting theorem [1.22| . 

At present we do not know if one can approach this bound. But let us make an 
experiment! Assume that also the encoder has the side information, i.e. now 

: A"^ X P" — > 6(2)). 

Since we are interested only in average performance it suffices that the scheme works 
well for typical G Af", say x" G Typ^^^. To encode this the encoder has just to 
collect the positions of equal x ^ X and do Schumacher quantum coding on blocks of 
length nPxix) -^a^ Px{x){X ~ Pxix))y/n. This scheme — with side information both at 
the encoder and the decoder — obviously achieves the rate if(2)|X) asymptotically with 
arbitary high fidelity. 



The c^Q'^— source: coding vs. side information 

With the c^g^-source the idea to consider extreme points in a certain convex region proved 
useful, and in connection with this the idea to encode only part of the source while using 
the rest as side information at the decoder. 

Whereas this paradigm is of undoubted worth in the classical theory, where we took it 
from (and which gave us some insights already for quantum communication problems, not 
just in the two previous sections but also in chapter pT]) , in general one must be cautious 
with it: using quantum information often means using it up. As an illustration consider 
once more the cloned wheel example [IV. 2| : 



Obviously we can encode 2ti with rate zero, with side information from 2t2 at the 
decoder, because the state vr on 2I2 is a faithful copy of the lost state vr on 2li. This is of 



course in contrast to theorem IV. 1, and we can note our last lesson 



Coding independent sources is not reducible to coding with side information. 



Extreme points of rate regions 

Here we prove the claim in the proof of theorem [iV.14| that every extremal point of the 
region of all (i?i, ... ,Rs) which satisfy for all J C [s] 

R{j) = Y,R^ > H{x{j)m{r)) (J) 
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is of the form 

for a permutation it of the set [s], and that these points all belong to the above region. 

Assume that we have an extremal point: it follows that s of the inequalities (J) are 
met with equality. Choose one, say K: 

R{K) = H{X{K)\^X{K'')). 

We claim that we can find the remaining inequalities (J) met with equality among the 
J G K OT J D K. This follows from the following 

Lemma IV.18 From R{K) = H{X{K)\^X{K^)) the validity of (J) for all J follows 
from the validity for those which contain K or are contained in K. 

Proof. First consider J D K: there we have 

R(J \K)> H{X{J)\'X)X{r)) - H{X{K)\^X{K')). 

Thus for arbitrary J, setting Ji = J H K, J2 = J H K'^, one obtains 

R{J) > H{X{Ji)\^X{J^)) + H{X{J2 U K)\^X{J^ n K^)) - H{X{K)\^X{K')) 
= H{^Xi ■■■Xs)~ H{^X{J^)) - H{^X{J^ n K^)) + Hi^XiK"")) 
>H{^X,---Xs)-H{X{J'^nJ^)) 
= H{X{J)\m{r)) 



by strong subadditivity (theorem |A.9|) , applied to 2li = X(J2), 2t2 = '^X{K'^ \ J2) and 



2l3 = X(i^\ Ji). □ 

If K is not a singleton there must be equalities below K., ii K ^ [s\ there must be 
some above: otherwise it is easily seen that we are not in an extremal point. So by 
induction we arrive at a chain ^ ^ Ki d K2 d ■ ■ ■ <Z Kg = [s] oi equalities, w.l.o.g. 
Ki = {s, s — 1, . . . , s + 1 — i}, which produces 

Ri = H{Xi\^Xi ■ ■ ■ Xi-i). 

To see that this is indeed a point of the region apply again the lemma, iteratively. 



Open questions 

The reader will have noticed that the past chapter consisted mainly of open questions, 
skilfully disguised as half theorems, examples and suggestions. For convenience we collect 
here some of the more important problems: 
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Examples Clarify example [IV. 2| : are there coding schemes with Ri = 1, R2 = 1/27 

Clarify the examples [IV. 8| and IV. 9 : can one improve the bounds? or can one actually 
construct coding schemes with Ri = R2 = 1/2, or at least Ri = 1, R2 = 1/2 for one or 
both of them? 



The c^g^— source Solve the c^g^-source completely! From the above it is enough to 
consider quantum source coding with classical side information, since we conjecture that 
theorem |IV.1[ gives (at least in this case) already the right bounds. 



More complicated sources Solve the c°g^-source: it seems that it is easier if we insist 
on no entanglement, but this might be a deception. 



Consider entanglement fidelity This seems to be the only right choice if dealing 
with arbitrary kinds of correlation. Also it simplifies things a bit: namely at least the 
result will depend only on the average state of the source. 



Techniques The only technique for code construction was the "code partition" trick. 
This is not satisfactory, as it destroys artificially the symmetry of the situation; also we 
have to resort to a channel coding theorem. 

A promising direct approach that may be converted to work for the quantum problems 
is the hypergraph coloring paradigm (see [Ahlswede (1979| ) and [Ahlswede (IQSOp ). Such 
a program would involve to elaborate further on techniques describable maybe by the term 
noncommutative combinatorics. 



Guiding ideas? One of the initial motivations of the work in this chapter was the 
idea that the classical Slepian-Wolf theorem is one possibility to give operational 
meaning to conditional entropies. As such a thing is completely lacking in quantum 
information theory, and on the other hand only formal definitions of quantum conditional 
entropy exist (derived from analogies, say with classical quantities), without consistent 
operational meaning, one sees that solving the above coding theorems would clarify this 
point dramatically. 

It is interesting to note that already at this stage we can foresee (from the lessons 
we learned from our examples) that there must be necessarily several natural notions of 
conditional entropy. 

Also we observe that the theory around Schumacher's coherent information fails to 
give the right answers even to simple problems. I suspect that this comes from the fact 
that this theory builds on pair entanglement, whereas our situations involved multi-party 
entanglement. 
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Appendix A 

Quantum Probability and 
Information 



In this appendix the basic mathematical machinery of quantum probabihty with special 
attention to information theory is collected. Alongside we introduce a calculus of en- 
tropy and information quantities in quantum systems. Whenever possible we refer to the 
literature instead of giving full proofs. 



Quantum systems 

In classical probability theory one has generally two ways of seeing things: either through 
distributions (and the relation of their images, mostly marginals), or through random 
variables (with a joint distribution). Both ways have their merits (though random vari- 
ables are considered more elegant), but basically they are equivalent, in particular none 
lacks anything without the other. Things are different in quantum probability, and we 
will take the following view: the analog of a distribution is a density operator on some 
complex Hilbert space, whereas the analog of random variables are observables, defined 
below. With density operators alone we can study physical processes transforming them, 
but every experiment involves some observable. Studying observables one usually fixes 
the underlying density operator (as the statistics of the experiments depend on the latter), 
but this falls short of not appropriately reflecting our manipulating quantum states, or 
having several alternative states. 

For the following we refer to textbooks on C*-algebras like [Arveson (19761) , and 



standard references on basic mathematics of quantum mechanics: PAVIES (1976| ), PrlRAUS 
^1983|) , and the more advanced [HoLEVO (1982|) . 



A C*-algebra with unit is a complex Banach space 2t which is also a C-algebra with 
unit 1 and a C-antilinear involution *, such that 

WABW < IUIIII5II, \\A*f = WAf = \\AA*\\ 



These algebras will be the mathematical models for quantum systems, and subsystems 
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are simply *-subalgebras (which are always assumed to be closed). 

The set 21^ of A G 2t that can be written as A = BB* is called the positive cone of 21 
which is norm closed, and induces a partial order <. By the famous Gelfand-Naimark- 
Segal representation theorem (see e.g. [Arveson ( 19761) ) every C*-algebra is isomorphic 
to a closed *-subalgebra of some i2(7i), the algebra of bounded linear operators on the 
Hilbert space Ti. With us all C*-algebras will be of finite dimension. It is known that 
those algebras are isomorphic to a direct sum of £(7ii) (see e.g. |Arvesqn (1976| )). This 
includes as extremal cases the algebras £(7i), and the commutative algebras over a 
finite set Af, with the generators a; e A" as idempotents. In particular we have on every 
such algebra a well defined and unique trace functional, denoted Tr , that assigns trace 
one to all minimal positive idempotents. 



States A state on a C*-algebra 21 is a positive C-linear functional p with p(l) = 1. 
Positivity here means that its values on the positive cone are nonnegative. Clearly the 
states form a convex set (5(21) whose extreme points are called pure states, all others are 
mixed. For 21 = £(?i) the pure states are exactly the one-dimensional projectors, i.e. 
using DiRAC's 6ra-fcet-notation, the \ilj){ip\ with unit vector I?/') G Ti. 

One can easily see that every state p can be represented uniquely in the form p{X) = 
Tr (pX) for a positive, selfadjoint element p of 21 with trace one (such elements are called 
density operators). In the sequel we will therefore make no distinction between p and its 
density operator p. The set of operators with finite trace will be denoted 21*, the trace 
class in 2t which contains the states and is a two-sided ideal in 21, the ScHATTEN-ideal 
(in our — finite dimensional — case this is of course just 21). Then Tt {pA) defines a 
real bilinear and positive definite pairing of 21*^ and 2ts, the selfadjoint parts of 21* and 
21, which makes 21^ the dual of 21*^. Notice that in this sense pure states are equivalently 
described as minimal selfadjoint idempotents of 21. 



Observables Let be a a-algebra on some set f2, X a C*-algebra. A map X : T — > X 
is called a positive operator valued measure (POVM), or an observable, with values in X 
(or on X), if: 

1. X(0) = 0, X{n) = 1. 

2. E C F implies X{E) < X{F). 

3. If {En)n is a countable family of pairwise disjoint sets in JF then X(|J^_E„) = 
J2n-^i^n) (in general the convergence is to be understood in the weak topology: 
for every state its value at the left equals the limit value at the right hand side). 

If the values of the observable are all projection operators and Q is the real line one speaks 
of a spectral measure or a voN Neumann observable.^ An observable X together with a 

^Strictly speaking this term only applies to the expectation of the measure (in general an unbounded 
operator), but this in turn by the spectral theorem determines the measure. 
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state p yields a probability measure P on Vt via the formula 

P''{E) = Tv{pX{E)). 

In this way we may view X as a random variable with values in X, its distribution we 
denote Px (note that Px may not be isomorphic to P^: if X takes the same value on 
disjoint events, which means that X introduces randomness by itself). 

Two observables X, Y are said to be compatible, if they have values in the same algebra 
and XY = YX elementwise, i.e. for all E E Tx, F e Ty. X{E)Y{F) = Y{F)X{E) 
(Note that it is possible for an observable not to be compatible with itself). By the way, 
the term compatible may be defined in obvious manner for arbitrary sets or collections of 
operators, in which meaning we will use it in the sequel. If X, Y are compatible we may 
define their joint observable XY : J^x x — ^ X mapping E x F to X{E)Y{F) (this 
defines the product mapping uniquely just as in the classical case of product measures). In 
fact we can analogously define the joint observable for any collection of pairwise compatible 
observables.0 As the random variable of a product XY we will take X x Y, rather than 
XY itself, with values in X x X (because the same product operator may be generated 
in two different ways which we want to distinguish). To indicate this difference we will 
sometimes write X ■ Y for the product. 

Note that here we can see the reason why we cannot just consider all observables as 
random variables (and forget about the state): they will not have a joint distribution, at 
first of course only by our definition. But Bell's theorem (|Bell (1964|) ) shows that one 
comes into trouble if one tries to allow a joint distribution for noncompatible observables. 
Conversely we see why we cannot do without observables, even though p contains all pos- 
sible information: the crux is that we cannot access it due to the forbidden noncompatibel 
observables (a good account of this aspect of quantum theory is by [Peres (1995| )). 

From now on all observables will be countable, i.e. w.l.o.g. are they defined on a 
countable Q with a-algebra 2^. This means that we may view an observable X as a 
resolution of 1 into a countable sum 1 = Yljen-^J '^^ positive operators Xj. 

If 2li , 2I2 are subalgebras of 21, they are compatible if they commute elementwise 
(again note, that a subalgebra need not not be compatible with itself: in fact it is iff it 
is commutative). In this case the closed subalgebra generated (in fact: spanned) by the 
products A1A2, Ai G 2lj is denoted 2ti2l2. 

Operations Now we describe the transformations between quantum systems: a C- 
linear map : 2I2 — >■ 2li is called a quantum operation if it is completely positive (i.e. 
positive, so that positive elements have positive images, and also the ip <S) id„ are positive, 
where id„ is the identity on the algebra of n x n-matrices), and unit preserving. These 
maps are in 1-1 correspondence with their (pre-)adjoints y?* by the trace form, mapping 

^Observe however that in general a joint observable might exist for non-compatible (i.e. non- 
commuting) observables. The operational meaning of this is that there is a common refinement of the 
involved observables. If they commute then this certainly is possible as demonstrated, but commutativity 
is not necessary. 
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states to states, and being completely positive and trace preserving.^ Since here we restrict 
ourselves to finite dimensional algebras the adjoint map simply goes from 2ti to 2t2, but 
to keep things well separated (which they actually are in the infinite case) we write the 
adjoint as (/J^, : — 2I2*, the dual map (in fact we consider this as the primary object 
and the operator maps as their adjoint, which is the reason for writing subscript *). Notice 
that ip^: is sometimes considered as restricted to : 6(2li) ©(2I2). A characterization 
of quantum operations is by the Stinespring dilation theorem QStinespring (19"55| )): 

Theorem A.l (Dilation) Let y9 : 21 — » £(?i) a linear map of -algebras. Then (p is 
completely positive if and only if there exist a representation a : 21 ^ '2(/C), with Hilbert 
space K, and a hounded linear map V : Ti ^ IC such that 

VA G 2t ip{A) = V*a{A)V. 

For proof see e.g. [Davies (1976| ). A commonly used corollary of this is 



Corollary A. 2 (cf |Kraus (iMBj )) Let ip : £(7^2) ^ S.{Hi) a linear map of C* -algebras. 



Then (p is completely positive and unit preserving if and only if there exist linear maps 
Bi : Hi ^ Ti.2 with J2i ^i^i = '^'^'^ 

e £(7^2) viA) = J2 B*ABi . 



Norms and norm inequalities For the C*-algebra £(7^) of linear operators on the 
complex Hilbert space Ti (of dimension d) the norm is the supremum norm || ■ ||oo, i-e. 
II A II 00 is the largest absolute value of an eigenvalue of A. The other important norm we 
use is the trace norm \\ ■ ||«||i is the sum of the absolute values of all eigenvalues of a. 
Note the important formula 

||a||i = sup{|Tr(aS)| : \\B\\^ < 1}, 

which explains the name "trace norm". Its proof is by the polar decomposition of a, 
see e.g. Arveson (1976 ). Also it implies immediately that || ■ ||i is nonincreasing under 



quantum operations. Obviously 

||a||oo < ||a||i < c?||"||oo • 

If a is self-adjoint we have the unique decomposition (via diagonalization) a = a+ — a_ 
into the positive and negative part of a, where a+, a;_ > and Q;+a_ = 0. Then note 

||a||i = Tra+ + Tra_ = sup{Tr (a5) : -1 < B < 1} 

and 

Tra+ = sup{Tr (aB) : < 5 < 1}. 

It should be clear that all the above suprema are in fact maxima. 

Finally note that these observations still hold for any direct sum of £,(Hi), d being 
replaced by the sum of the dim?-^j. 



^In general this is only true if we restrict to be a normal map, cf. Davies (1976) 
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Entropy and divergence 



The VON Neumann entropy of a state p (introduced by |voN Neumann (1927|) H) is 



defined as H{p) = — Tr(plogp), which reduces to the usual Shannon entropy for a 
commutative algebra because then a state is nothing but a probability distribution. For 
states p, a also introduce the I-divergence, or simply divergence (first defined by [Umegak^ 
(1962|) ) as D{p\\(t) = Tr (p(logp — logcr)) with the convention that this is oo if suppp ^ 
supp a (supp p being the support of p, the minimal selfadjoint idempotent p with ppp = p). 
For properties of these quantities we will often refer to |Ohya Petz (1993| ), and to 



Wehrl (1978|) . Three important facts we will use are 

Theorem A. 3 (Klein inequality) For positive operators p,a (not necessarily states) 

D{p\\a)>^TT{p-af + TT{p-a). 

In particular for states the divergence is nonnegative, and zero if and only if they are 
equal. 



Proof See PHYA fc Petz (1993| ). □ 



Lemma A. 4 (Continuity) Let p,a states with \\p — (t\\i <9 <-. Then 

\H{p)-H{a)\<-e\og-^ = d^ (-^ 



Proof. See pHYA fc Petz (1993| ). □ 



Theorem A. 5 (Monotonicity) Let p,a be states on a C*-algebra 21, and ip^ a trace 
preserving, completely positive linear map from states on 21 to states on !8. Then 

Di^Lp^pWip^a) < D{p\\a). 



Proof. See [Uhlmann (19771) ; the situation we are in was already solved by [Lindblad 



K 19751) . For a textbook account see |Ohya fc Petz (19931) . □ 



Observable language 

This and the following two sections will introduce language (or formalism) to talk about 
entropy and information in the context of quantum systems in a transparent fashion. 

Fix a state on a C*-algebra, say p on 21 and let X, Y, Z compatible observables on 
21. These are then random variables with a joint distribution, and one defines entropy 

^It was in fact introduced independently in the same year by Landau and Weyl. 
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H{X), conditional entropy H{X\Y), mutual information I{X AY), and conditional mu- 
tual information I(X AY\Z) for these observables as the respective quantities for them 
interpreted as random variables. Note however that these depend on the underlying state 
p. In case of need we will thus add the state as an index, like Hp{X) = H{X), etc. 

As things are there is not much to say about that part of the theory. We only note 
some useful formulas: 



H{X\Y) = ^ IV {pYj)H,^{X), with pj = 




(which is an easy calculation using the compatibility of X and Y) , and 

I{X AY)= H{X) + H{Y) - H{XY) 

= D(P^^||P^ ® P^) = D{Px.y\\Px ® Py) 

(whose analogue is known from classical information theory). 

Subalgebra language 

Let X, Xi, X2, 2) compatible *-subalgebras of the C*-algebra 21, and p a fixed state on 21. 

First consider the inclusion map z : 3C ^ ^ (which is certainly completely positive) 
and its adjoint : 21* — > X*. Define 

H{X) = H{t,p) 

(where at the right hand appears the VON Neumann entropy). For example for X = 21 
we obtain just the VON Neumann entropy of p. For the trivial subalgebra C = Cl 
(which obviously commutes with every subalgebra) wc obtain, as expected, H{C) — 0. 
The general philosophy behind this definition is that H{X) is the VON NEUMANN entropy 
of the global state viewed through (or restricted to) the subsystem X. To reflect this in 
the notation we define p\x = i*P- 

Now conditional entropy, mutual information, and conditional mutual information are 
defined by reducing them to entropy quantities: 

H{Xm = H{X^) - Hm 

/(Xi A X2) = H{Xi) + H{X2) - H{XiX2) 
= H{X2) — H{X2\X\) 



/(Xi A X2I2)) = i/(Xi|2)) + H{X2m - H{XiX2m 
= H{Xm + ^(^2?)) - i/(XiX22)) - 
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It is not at all clear a priori that these definitions are all well behaved: while it is ob- 
vious from the definition that the entropy is always nonnegative, this is not true for the 
conditional entropy (as was observed by several authors before): if 21 = j£ (8> 2) and p is a 
pure entangled state then if(X|2)) = — i7(2)) < 0. This might raise pessimism whether 
the other two quantities also are (at least sometimes) pathological. This they are not (at 
least not in this way), as will be shown in a moment: 

We have the following commutative diagram of inclusions, and the natural multiplica- 
tion map yU (which is in fact a *-algebra homomorphism, and thus completely positive!): 



VI 



V2 



Xi 





n 






n 


X1X2 - 




^ 21 












X2 = 




- 1 


■2 



And hence the corresponding commutative diagram of adjoint maps (note that and 
(^2* are just partial traces). With this we find 

/(Xi A X2) = if(Xi) + iJ(X2) - if(XiX2) 
= H{]up) + H{j2^p) - H{j^p) 

= D{p^j^p\\(pup^j^p (g) (p2*p*j*p) 

by definition, then by commutativity of the diagram and the fact that /i* preserves eigen- 
values of density operators (because /i is a surjective *-homomorphism, see lemma A. 6 
below), the last by direct calculation on the tensor product (just as for the classical for- 
mula). From the last line we see that the mutual information is nonnegative because 
the divergence is, by theorem [A.3| (we could also have seen this already from the defi- 
nition by applying subadditivity of voN Neumann entropy to the second last line, see 
theorem |A.9|) . 

Lemma A. 6 Let p : —>■ ^ a surjective *-algebra homomorphism. Then 

1. For all pure states p G (5 (21).' fi{p) is pure or 0. 

2. For all A e 01, A > 0: Tr A > Tr p{A). 

3. For pure p G 6(21), q G 6(53). ■ 

f^Mp)) =P or p{p) = 0, p{pMp))) = = Q- 



4- For p G 6(*B), /U*(p) = J2i ^iPi diagonalization with the ai > 0, then p = J2i (^il^{Pi) 
is a diagonalization. 



63 



5. Conversely every diagonalization of a state on 5B is by /z* translated into a diago- 
nalization of its ^^-image. 



1. We have only to show that fi{p) is minimal if it is not 0: let q' any pure state with 
q' < fi{p). Then 



So we must have equahty which implies p < fi*{q'), but both operators are states, 
so p = fi*{q')- Because /i* is injective this means that there is only one pure state 
q' < /^(p), i-e. is pure. 

2. We may write A = J2i '^iVi with pure states pi and > 0. Then = aifi{pi) 
and since pure states have trace 1 the assertion follows from (1). 



Thus fJ'*(fJ'{p)) < P- If 7^ it is a pure state, hence a state which 

forces fi^:{fi{p)) = p. This proves the left formula, the middle follows immediately, 
and for the right observe that we may choose a pure pre-image p of q (in fact that 
will be f^*{q), as one can see from (4)). 

4. J2i (^il^iPi) is certainly the diagonalization of some positive operator since the 

which are not are by the homomorphism property and by (1) pairwise orthogonal 
pure states. Now observe = and 



Proof. 



1 = Tr {q'fi{p)) = Tr ifi*iq')p) < Tr (p) = 1. 



3 



Let A e 21, A > 0. Then 



Tr {fi.{fi{p))A) = Tr = Tr 

= Tr {fi{pAp)) < Tr (pAp) = Tr (pA). 



/i*(p) 



hence equality, i.e 



all fi{pi) are pure. From 




and injectivity of /i* the assertion follows. 



5 



This is a direct consequence of (3) and (4). 



□ 



For the conditional mutual information we have to do somewhat more (yet from the 
definition we see that its positivity will have something to do with the strong subadditivity 
of VON Neumann entropy, see theorem |A.9| ): 
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Consider the following commuative diagram: 



2) 



V2 



X2 ®2) 



A«2 



Xi2) 
X1X2?) 

X2?) 



21 



All maps there are completely positive, /i,/ii,/i2 being *-homomorphisms. Thus the 
adjoints of the various v^'s are partial traces and with a = fi^:j^:p: H{XiX2^) = H{a), 
H{Xi^) = H{TTx,a), i/(X22)) = H{TTx,a), H{^) = H{Ti^,^^,a) (where we have 
made use of lemma [A.6| several times), and we can indeed apply strong subadditivity. 
Finally let us remark the nice formulas 



H{X) = H{X\C), /(Xi A X2) = /(Xi A X2IC). 



Example A. 7 A very important special case of the definitions of this and the preceding 
section occurs for tensor products of Hilbert spaces £(7^1 ® TC2) = S'iTi.i) ® £(7^2), or 
more generally tensor products of C*-algebras: 21 = 2li 0212. 2ti, 2I2 are *-subalgebras of 
21 in the natural way, and are obviously compatible. The same then holds for observables 
Ai C 2lj, and similarly for more than two factors. In this case the restriction plsi^ is just 
a partial trace. 



Remark A. 8 It should be clear that we introduced (having H) conditional entropy and 
mutual information by formal analogy to the classical quantities. We cannot claim to 
have an operational meaning of them in general — the theorems in the main text must be 
seen as exceptions to this rule. 

We are in this respect in accordance with [Levitin (1998{ ) who went even further 
by rejecting the very name "conditional entropy" for H{-\-), and proposed to return to 
the name "correlation entropy" given by Stratonovich to the quantity /(■ A on the 
grounds of a strictly operational reasoning (which is only open to the one criticism that 
Levitin always sticks with classical information, never acknowledging the unprecedented 
properties of quantum information). 



Common tongue 

The languages of the two preceding sections may be phrased in a unified formalism (the 
"common tongue") using completely positive C*-algebra maps (in particular those from 
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or to commutative algebras, inclusion maps, and *-algebra homomorphisms, cf. |StinE' 
S PRING (19551) ). 

That this is promising one can see from the observation that observables can be inter- 
preted in a natural way as C*-algebra maps: X : — 2t corresponds by linear extension 
to X : Q3(f2) — i> 2t, where ?B(f2) = !B(fi,jF) is the algebra of bounded measurable func- 
tions on fl. We follow the convention that in this algebra j G shall denote the function 
that is 1 on j and elsewhere, so X{j) = Xj, and obviously X*(p) equals the distribution 
on Q induced by X with p. 

Let us also introduce some notation for the observable X: the total observable opera- 



nt mapping j 



'Yj, its interior part Xint = Xtot o^a 



'Yj, and its exterior part Xgxt = -^tot ° which coincides 



tionXtot -.^iSl)®^- 
21 ^ 21 with A ^ J2j 
with X. 

Consider compatible quantum operations : X — 21, t/' : 2) — * 2t, etc. {ip, ip are com- 
patible if their images commute elementwise). In this case their product is the operation 
V^V' : X (g) 2) 21 mapping X ^ (f{X)i:lY): 



2t 



X®2) ^ 2t 



f2 



2) 



21 



Note that this generalizes the product of observables, as well as the product map fi of 
subalgebras. 

Now simply define H{ip) = H{ip^p), and again the conditional entropy and the in- 
formations are defined by reduction to entropy, e.g. H{ip\il)) = H{ipilj) — H{tp), or 
I{ip A^) = H{ip) + H{^) - H{ip^P). 

For the mutual information observe that (see previous diagram) 



= D(cr||Tr (gcr (g) Tr xO') 



with a = {ipil>)^p. 



Note the difference to Ohya fc Petz (1993 ): with them the entropy of an operation is 
related to the mutual information of the operation as a channel. With us the entropy of 
an operation is the entropy of a state "viewed through" this operation (as was the idea 
with the entropy of a subsystem, and obviously also with the entropy of an observable). 

With these insights we may now form hybrid expressions involving observables and 
*-subalgebras at the same time: let « : X 21, j : 2) =— > 21 *-subalgebra inclusions, and 
X, Y observables on 21, all four compatible. Then we have 



H{X\Y) = H{iY) - H{Y) 
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J(X AY) = H{i) + H{Y) - H{iY), 

and lots of others. From the previous section we know that the information quantities 
are nonnegative, but also the entropy conditional on an observable, from the formula 

H{X\Y) = J2^HpY,)H,^m, with p, = .jT^V^Pv^ ■ 

But again there are some expressions which seem suspicious, like 

H{XW) = HiXj)-Hm. 
However, due to the inequality of theorem |A.20| in fact it behaves nicely. 



Inequalities 

Entropy Let us first note the basic 

Theorem A. 9 For compatible *-subalgebras 2ti,2l2,2l3 one has: 

1. SubaddiUvity: H{%i%2) < H{^{)+H{^2)- 

2. Strong subadditivity: //(StiSlaSlg) + //(Sis) < //(StiSts) + H{^2^z)- 

(In our language this is equivalent to the more natural form 
if(2ti2l3|2l2) < if(2ti|2l2) +/f(2t3|2t2);. 

Proof. Subadditivity is a special case of strong subadditivity: %2 = C. The latter can 
be reduced to the familiar form, proved first by LlEB & RuSKAl (see the references in 
[Uhlmann (1977] )), by the same type of argument as we used in the section Subalgebra 
language for the nonnegativity of conditional mutual information. □ 

Another kind of inequality may serve as an operational justification of the definition 
of VON Neumann entropy. Call a quantum operation (y9 : 2li — 2t2 doubly stochastic if 
it preserves the trace, i.e. for all A e 2ti: Tt Lp{A) = Ti A (see |Ohya fc Petz (1993D ). 



We will consider the less restrictive condition Ti (p{A) < Ti A, and for an observable 
X, a *-subalgebra X let us say it is maximal in 21 if X, the inclusion map has this 
property, respectively (obviously for the *-subalgebra this imphes doubly stochastic). 
Main examples are: an observable whose atoms are minimal in the target algebra, i.e. 
have only trivial decompositions into positive operators, and a maximal commutative 
*-subalgebra. 

Theorem A. 10 (Entropy increase) Let : 2) ^ X with Ti(^(A) < Ti A, and ip '■ 

X ^ 2t quantum operations. Then H{il! o ip) > H{il)). (Notice that in the physical sense 
the operation ip^ is applied after ip^,). 
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Before we prove this let us note two important case of equality: Let p = ^ • XiPi with 
mutually orthogonal pure states pi, Aj > 0, J2iPi = 1- Then equality holds for the *- 
subalgebra generated by the Pi (in fact for any *-subalgebra which contains them), and 
for the observable that corresponds to the p^'s resolution of 1. 

Proof of theorem \A.1L\ . Let a = il'*Py we have to prove H{(f^:a) > H{a). From the 



previous discussion we see that we may assume 2) to be commutative, without changing 
the trace relation. Let a = Y^ - aiPi a diagonalization with pure states Pi on X, and qj the 
family of minimal idempotents of 2) (which by commutativity are othogonal). Then we 
have decompositions 99 ^.pj = 'YjPijQj, hence 



i j \ i J 



Now observe that for all j 

^ = Tr i^j ^ y?,pi j = Tr i^^pq^) J^P* j = Tr (^g^) < Tr (g^) = 1, 

and the result follows from the formulas Hicf) = H{ai\i), H{ip^.a) = HiJ2iPij(^i\j)- '-' 

Let us formulate the special cases of maximal observables and maximal *-subalgebras 
as a corollary: 

Corollary A. 11 Let X an observable maximal in X, then H{X) > H{X). Let j£' a 

*-subalgebra maximal in X, then H{X') > H{X). □ 

An application of this is in the proof of 

Theorem A. 12 Let X, 2) compatible, p\x^ pure. Then H{X) = if (2)). 

Proof. By retracting the state p to X ® 2) by the multiplication map /i : X ® 2) ^ X2) 
(see lemma [A.6| ) and embedding X and 2) into full matrix algebras (see the proof of the 
next theorem) we may assume that we have a pure state p on £(7ii) ® -2(7^2) (entropies 
do not change as the *-subalgebras are maximal). Then the assertion of the theorem is 
H{Ttxp) = H{Tt<qp) which is well known (proof via the Schmidt decomposition of 
where p = \^p){^p\■. cf. [Peres (1995| )). □ 



Theorem A. 13 Let X, 2) compatible, p any state. Then \H{X) - ii(2))| < ii(X2)). 

Proof. Like in the previous theorem we may assume that p is a state on X (8) 2), and by 
symmetry we have to prove that 

H{X)-H{^) < H{X^). 

If we think of X and 2) as sums of full operator algebras, say X = 0.ii(7Yj), 2) = 
£^{}Cj), then embedding them into ^{^iTi-i), -2(0 JCj), respectively, does not change 



68 



the entropies involved (because the *-subalgebras are maximal). Thus we may assume 
that X = S^iTi), 2) = £(/C). Now consider a purification of p on the Hilbert space 
H ® fC ® C (see e.g. [Schumacher (1996D ): this means p = Tr £(£)!■?/')(■?/' |. Now by 
theorem [Al2l H{X) = if (2)3), H{X^) = //(3), and the assertion follows from the 
subadditivity theorem //(2)3) < Hi^) + ^^"(3)- □ 



Information The following inequality for mutual information is a straightforward gen- 
erahzation of the HoLEVO bound ( [HoLEVO (1973| ), see theorem |A.16| below): 



Theorem A. 14 Let X,Y be compatible observables with values in the compatible *- 
subalgebras X,^, respectively. Then 

I{X AY)< I{X AY)< I{X A 2)). 

Proof. Consider the diagram 



X 



Q5(nx) ® 25(fiy) X®?B(^]y) X®2) 21 



Y 



2) 



and apply the Lindblad-Uhlmann monotonicity theorem |A.5| twice, with /i*(p) and the 
maps (id ® y)* and (X ® id),,,, one after the other. □ 

This can be greatly extended: for example if X C X', 2) C 2)', then 

/(XA2)) < /(X' A2)'). 

The most general form is 

o y9i A 7/^2 o '^2) < /(^i A 7/^2) 

in the diagram 

% 2li 21 



Of/ ^ Of/ Vi^'/'a Of ^ Of i'=i'ii'2 Of 

21-L (8> 2I2 > 2li C?) 2I2 > 21 



2l'o 



¥'2 



21, 



V'2 



21 
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Remark A. 15 It is worth noting that the above formulation of the information bound has 
the nice form of a data processing inequality. To dwell on this point a little more, and at 
the same time link our discussion with the traditional view and the language employed in 
the chapters^ and^^ of the main text let us define for a (measureable) map (p^ : X ©(2)) 
(which we identify with its linear extension to CX and regard as a quantum channel, see 
chapter^ and a p.d. P on X 

I{P; = /^(CA" A2)) 
with the channel state 7 = ^3,g;f ® It is easily verified that 



I{P- = H{P^,) - H{ip,\P) where 



H{vAP) = Y..^^P{x)H{^,{x)). 



Now with a quantum operation ■j/'^, : 2)^, ^ 3* the data processing inequality specializes to 

I{P;ijj^ oip^) < I{P;ip^). 

In particular if 3 is commutative, i.e. the operation, now denoted D^, is a measurement, 
we recover the 

Theorem A. 16 (HOLEVO bound) I{P; o ip^) < I{P; ip^). □ 
In chapter |l| an elementary proof of this inequality is presented. 



Theorem A. 17 Let Xi, X2, 2)i, 2)2 compatible *-subalgebras of 01, p a state on 21. Then 

/(X1X2 A 2)i2)2) < /(Xi A 2)i) + /(X2 A 2)2) 

z//(2)i A X22)2|Xi) = anc? 1(2)2 A Xi2)i 1X2) = (i.e. 2)^ is independent from the other 
*-subalgebras conditional on Xk). 

Proof. First observe that the conditional independence mentioned, /(2)i A X22)2|Xi) = 
0, is equivalent to if(2)i 1X1X22)2) = -ff(2)i|Xi). By theorem |A.23| we then have also 
if(2)i|XiX2) = -ff(2)i|Xi). Now observe (with the obvious chain rule) 

i/(2)i2)2|XiX2) = //(2)i|XiX22)2) + i/(2)2|XiX2) 
= ff(2)i|Xi) + if(2)2|X2) 



and hence 



J(XiX2 A 2)i2)2) = //(2)i2)2) - if (2)12)21X1X2) 

< /f(2)i) + /f(2)2) - i/(2)i|Xi) - i/(2)2|X2) 
= /(Xi A2)i) + /(X2A2)2) 
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where we have used the subadditivity of voN Neumann entropy, theorem |A.9| . □ 

The same obviously apphes if we have n *-subalgebras X^, and n 2)^, all compatible, and 
if 2)yt is independent from the others given X^, i.e. for all k 

if(2)fc|Xi ■ ■ ■ X„2)i . . . . . . 2)„) = if(2)fc|Xfe). 



Corollary A. 18 Let Xi, . . . 2)i, . . . , 2)„ C* -algebras, Xi = CXi commutative, and 
2l = Xi®---(8>Xn®2)i®'''® 2)„. Then with the state 

7 = ^ P(xi, . . . , x„)[xi] ® ■ ■ ■ ® [x„] 0W^,0---(g)W^^ 
on 21 (where P is a p.d. on Xi x ■ ■ ■ x one? maj»s the Xi to states on ^i): 

n 

/(Xi ■ ■ ■ X„ A 2)i ■ ■ ■ 2)„) < 5Z A 2)fc). 

fc=i 

Proof. We only have to check the conditional independence, which is left to the reader. □ 
We note another estimate for the mutual information: 

Theorem A. 19 For compatible *-subalgebras X,2).- /(XA2)) < 2 min{if(X), if(2))}. 

Proof. Put together the formula /(X A 2)) = H{X) — if(X|2)) and the simple estimate 
^(X|2)) > -H{X) from theorem |^l3l □ 

Conditional entropy We start with a simple positivity condition: 

Theorem A. 20 Let (y9:X^2l, ^/':2)^2l compatible quantum operations with X or 2) 
commutative. Then H{ip\tlj) > 0. 

Proof. Let cr = [(filj)^p, then by definition and lemma |A.6| 

H{ip\i;) = H{a)-H{TT^a). 

First case: X is commutative, so we can write a = J2x Q{x)[x] ® T*(a;) with a distribu- 
tion Q on X, and states r*(x) on 2). Obviously -ff(cr) = H{Q) + ^^"^ 
T^^^ = ExQi^M = and hence H{ip\^) = Y.xQ{x)HMx)) > 0. 

Second case: 2) is commutative, so we can write a = J2xQ(^)l^]''~*(^) ® t^]' ^^^^ 
the first case. -^(0") is calculated as before, but now Trtgo" = '^,j.Q{x)t^:{x) = Qt^,, and 

H{^\ij) = H{Q) - (^H{Qn) - Q{x)H{t,{x)) 

= H{Q)-I{Q;n)>0, 
the last step by an application of the HoLEVO bound, theorem |A.16|. □ 
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Remark A. 21 From the proof we see that the commutativity of X or enters in the 
representation of a as a particular separable state with respect to the *-suhalgehras X, 2) 
(see definition below), namely with one party admitting common diagonalization of her 
states. We formulate as a conjecture the more general: 

if(X|2)) > if p is separable with respect to X and 2). 

From this it would follow that in this case /(XA2)) < min{if(X), -ff(2))} (compare theo- 
rem \A.1(^), which we now only get from the commutativity assumption. 



Definition A. 22 Call p separable with respect to compatible *-subalgebras Xi, . . . ,Xm 
of 01, if, for the natural multiplication map p : Xi^- ■ -^Xm — * 21, is a separable state 
on Xi® ■ ■ ■ ® Xm, i-G- a convex combination of product states ai ■ • • ® (Jm, en G (5(Xj). 
// p^p is a product state, we call also p a product state with respect to Xi, . . . , Xm- 



Theorem A. 23 (Knowledge decreases uncertainty) Let ip : X ^ 01, : ^ ^ 'Qi 

compatible quantum operations, and if' : X' ^ X any quantum operation. 
Then H{ip\i.p) < H{;ip\ip o ip'), and in particular H{;ip\ip) < H^ip). 

Proof. The inequality is obviously equivalent to /(■?/' A (p) > /(■?/' A o y?'), i.e. to theo- 
rem |A.14| . □ 

Defining h{x) = —a; log a; — (1 — x) log(l — x) for x G [0, 1] we have the famous 

Theorem A. 24 (Fano inequality) Let p a state on 21, and 2) be a *-subalgebra o/2l, 
compatible with the observable X (indexed by X). Then for any observable Y with values 
in 2) the probability that "X ^ Y", i.e. Pg = 1 — Tr [pXjYj), satisfies 

H{X\^)<h{P,) + PJog{\X\-l). 

Proof. By the previous theorem A. 23 it suffices to prove the inequality with H{X\Y) 



instead of i/(X|2)). But then we have the classical Fano inequality: the uncertainty on 
X given Y may be estimated by the uncertainty of the event that they are equal plus the 
uncertainty on the value of X if they are not. □ 



Corollary A. 25 Let X a commutative *-subalgebra compatible with 2), and X the — 

uniquely determined — maximal observable on X, Pe as in the theorem, then 



^(X|2)) < /i(Pe) +Pelog(Trsupp (pIx) - 1). 
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Proof. First observe that if(X|2)) = i/(X|2)). To apply the theorem we only have to 
restrict the range of X to those values that are actually assumed. □ 

Some philosophical remarks may be in order: quantum theory stipulates the channel as 
a process, an asymmetric notion, and this brings about the formula I{P; v^*) = H{Plp^) — 
H{ip^\P): input, average and conditional output entropy. In classical information theory 
however we like to see things more symmetric, namely the channel as a stochastic two- 
end system, with some underlying joint distribution. Following this idea produces our 
channel states 7, and a symmetric "information" expression /(X A 2)). Even though 
there are questions in quantum information where these two pictures can be brought to 
relation, for example in the above results (a connection that was noticed before by [Hall 



(19971) in his investigation of what he calls context mappings), they are not reducible to 
each other: the "dynamic" picture is asymmetric (there may not even exist a backward 
channel producing the same channel state), whereas the "static" picture is obviously 
symmetric. Even worse, for a joint state it is not obvious that a channel and input 
distribution generating it exist at all. And if it exists, there is no uniqueness in its choice. 
On the other hand, modelling a situation of quantum evolutions statically may produce 
unphysical effects, see the example from [Winter (1998c|) , VIII. B. 2, pp.24: the channel 



state incorporates parts of a system which can never be simultaneously accessible. 
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